Closed jackielii closed 8 months ago
We are experiencing the same issue with our Bloomz model http://hf.co/cmarkea/bloomz-7b1-mt-sft-chat this model is also chunked.
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/bloom_modeling.py", line 609, in __init__
self.word_embeddings = TensorParallelEmbedding(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 306, in __init__
weight = weights.get_partial_sharded(f"{prefix}.weight", dim=0)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 76, in get_partial_sharded
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight word_embeddings.weight does not exist
@jackielii Didn't you forget --quantize gptq
since you seem to be loading a gptq model given your issue ?
@Benvii The model you are linking was saved with a transformer.
prefix in the model which we don't support for now.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
System Info
cargo 1.72.0 (103a7ff2e 2023-08-15)
text-generation-launcher --env :
Information
Tasks
Reproduction
I'd like to use my GPTQ fine tuned model using example script referenced in the HF GPTQ Integration blog post. To run this script, I need to use latest transfermers 0.4.33 . However after training and merging the adapter into base model, then loaded into TGI, I get error:
After a bit googling, I found out that it's probably the transformer version mismatch: https://huggingface.co/jondurbin/airoboros-l2-70b-gpt4-1.4.1/discussions/3#64cc1b4ba257a3212c0e473b
I'm not sure that's the reason.
As said above, the reproducing step is:
merge.py
Expected behavior
Merged model loaded correctly in TGI