Gemma2 fails due to missing model.embed_tokens.weight

sebastian-nehrdich commented 2 months ago

System Info

This is the current version installed from github, text-generation-launcher 2.1.1-dev0

Information

[ ] Docker
[X] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

Clone the current github repository, install
run text-generation-launcher --max-input-tokens 1024 --max-total-tokens 2048 --max-batch-size 12 -p 3409 --model-id google/gemma-2-9b --master_port 29501

It fails with the following error message:

File "/rscratch/nehrdich/miniconda3/envs/lm2/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete                                                
self.run_forever()                                                                                                                                                
File "/rscratch/nehrdich/miniconda3/envs/lm2/lib/python3.10/asyncio/base_events.py", line 603, in run_forever                                                       
self._run_once()                                                                                                                                                  
File "/rscratch/nehrdich/miniconda3/envs/lm2/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once                                                        
handle._run()                                                                                                                                                     
File "/rscratch/nehrdich/miniconda3/envs/lm2/lib/python3.10/asyncio/events.py", line 80, in _run                                                                    
self._context.run(self._callback, *self._args)                                                                                                                    
> File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/server.py", line 231, in serve_inner                                           
model = get_model(                                                                                                                                                
File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/models/__init__.py", line 645, in get_model                                    
return FlashGemma2(                                                                                                                                               
File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/models/flash_gemma2.py", line 69, in __init__
model = FlashGemma2ForCausalLM(prefix, config, weights, causal=True)                                                                                              
File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/models/custom_modeling/flash_gemma2_modeling.py", line 454, in __init__
self.embed_tokens = TensorParallelEmbedding(                                                                                                                      
File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/layers/tensor_parallel.py", line 230, in __init__
weight = weights.get_partial_sharded(f"{prefix}.weight", dim=0)                                                                                                   
File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/utils/weights.py", line 89, in get_partial_sharded
filename, tensor_name = self.get_filename(tensor_name)                                                                                                            
File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/utils/weights.py", line 64, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")                                                                                                        
RuntimeError: weight model.embed_tokens.weight does not exist                                                                                                         
2024-07-02T07:21:44.507020Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Expected behavior

The model should start without the error message.

ErikKaum commented 1 month ago

Hi @sebastian-nehrdich 👋

Thanks for reporting this. After the log message Shard complete standard error output is there any exit code from the shard logged as well?

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

huggingface / text-generation-inference