huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.77k stars 1.02k forks source link

`OPTForCausalLM` 's `prefix` should not be `'model'` in `opt_modelling.py` #2197

Closed sadra-barikbin closed 4 weeks ago

sadra-barikbin commented 2 months ago

Hi there!

OPTForCausalLM doesn't give prefix to OPTModel in opt_modelling.py while it's a positional argument to it.

https://github.com/huggingface/text-generation-inference/blob/05c094fcfae4d869e12910f637b4dc9d7a9e0421/server/text_generation_server/models/custom_modeling/opt_modeling.py#L751-L764

https://github.com/huggingface/text-generation-inference/blob/05c094fcfae4d869e12910f637b4dc9d7a9e0421/server/text_generation_server/models/custom_modeling/opt_modeling.py#L694-L698

sadra-barikbin commented 2 months ago

Besides, in v2.1.1, OPT's embedding is loaded using prefix

https://github.com/huggingface/text-generation-inference/blob/4dfdb481fb1f9cf31561c056061d693f38ba4168/server/text_generation_server/models/custom_modeling/opt_modeling.py#L440-L442

which raises this error:

 File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/opt.py", line 62, in __init__
    model = OPTForCausalLM(config, weights)

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/opt_modeling.py", line 749, in __init__
    self.model = OPTModel(config, weights)

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/opt_modeling.py", line 691, in __init__
    self.decoder = OPTDecoder(config, weights)

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/opt_modeling.py", line 440, in __init__
    self.embed_tokens = TensorParallelEmbedding(

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/tensor_parallel.py", line 230, in __init__
    weight = weights.get_partial_sharded(f"{prefix}.weight", dim=0)

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 89, in get_partial_sharded
    filename, tensor_name = self.get_filename(tensor_name)

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 64, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")

RuntimeError: weight model.decoder.embed_tokens.weight does not exist

With the latest changes on the main, It seems still it would raise error as CausalLM sets prefix to "" in

https://github.com/huggingface/text-generation-inference/blob/05c094fcfae4d869e12910f637b4dc9d7a9e0421/server/text_generation_server/models/causal_lm.py#L556-L557

and prefix becomes "model.decoder.embed_tokens"

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

sadra-barikbin commented 1 month ago

Hi. This issue still persists. The OPT models in the hub do not have a 'model' at the beginning of their weights' names. This causes error when loading OPT which sets the prefix to 'model'. @danieldk

https://github.com/huggingface/text-generation-inference/blob/133015f40821706b1eaf9943aa3c9aa477d0c614/server/text_generation_server/models/custom_modeling/opt_modeling.py#L755-L758

sadra-barikbin commented 4 weeks ago

Fixed by #2371