hyperonym / basaran

Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
MIT License
1.29k stars 81 forks source link

RuntimeError: expected scalar type Float but found Half #215

Open DataDropp opened 1 year ago

DataDropp commented 1 year ago

Found an issue with loading the Salesforcet5/codet5-large-ntp-py model.

basaran_1  | ERROR:waitress:Exception while serving /v1/completions
basaran_1  | Traceback (most recent call last):
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/waitress/channel.py", line 428, in service
basaran_1  |     task.service()
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/waitress/task.py", line 168, in service
basaran_1  |     self.execute()
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/waitress/task.py", line 456, in execute
basaran_1  |     for chunk in app_iter:
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/werkzeug/wsgi.py", line 289, in __next__
basaran_1  |     return self._next()
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/werkzeug/wrappers/response.py", line 32, in _iter_encoded
basaran_1  |     for item in iterable:
basaran_1  |   File "/app/basaran/__main__.py", line 187, in stream
basaran_1  |     for choice in stream_model(**options):
basaran_1  |   File "/app/basaran/model.py", line 73, in __call__
basaran_1  |     for (
basaran_1  |   File "/app/basaran/model.py", line 215, in generate
basaran_1  |     kwargs["encoder_outputs"] = encoder(**encoder_kwargs)
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1423, in _call_impl
basaran_1  |     return forward_call(*input, **kwargs)
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
basaran_1  |     output = old_forward(*args, **kwargs)
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/transformers/models/t5/modeling_t5.py", line 1090, in forward
basaran_1  |     layer_outputs = layer_module(
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1423, in _call_impl
basaran_1  |     return forward_call(*input, **kwargs)
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
basaran_1  |     output = old_forward(*args, **kwargs)
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/transformers/models/t5/modeling_t5.py", line 693, in forward
basaran_1  |     self_attention_outputs = self.layer[0](
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1423, in _call_impl
basaran_1  |     return forward_call(*input, **kwargs)
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
basaran_1  |     output = old_forward(*args, **kwargs)
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/transformers/models/t5/modeling_t5.py", line 599, in forward
basaran_1  |     normed_hidden_states = self.layer_norm(hidden_states)
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1423, in _call_impl
basaran_1  |     return forward_call(*input, **kwargs)
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
basaran_1  |     output = old_forward(*args, **kwargs)
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/apex/normalization/fused_layer_norm.py", line 386, in forward
basaran_1  |     return fused_rms_norm_affine(input, self.weight, self.normalized_shape, self.eps)
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/apex/normalization/fused_layer_norm.py", line 189, in fused_rms_norm_affine
basaran_1  |     return FusedRMSNormAffineFunction.apply(*args)
basaran_1  |   File "/usr/local/lib/python3.8/dist-packages/apex/normalization/fused_layer_norm.py", line 69, in forward
basaran_1  |     output, invvar = fused_layer_norm_cuda.rms_forward_affine(
basaran_1  | RuntimeError: expected scalar type Float but found Half

I've forked this repo and added a fix, however I think it breaks every other model out there, so I didn't make a PR. I can still create a PR if you'd like me to.

https://github.com/lvnvceo/basaran/commit/61c1d4131e6de5798166e6ccab72f5e865a4fcab