huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.75k stars 1.02k forks source link

Gemma2 fails due to missing model.embed_tokens.weight #2162

Closed sebastian-nehrdich closed 2 weeks ago

sebastian-nehrdich commented 2 months ago

System Info

This is the current version installed from github, text-generation-launcher 2.1.1-dev0

Information

Tasks

Reproduction

  1. Clone the current github repository, install
  2. run text-generation-launcher --max-input-tokens 1024 --max-total-tokens 2048 --max-batch-size 12 -p 3409 --model-id google/gemma-2-9b --master_port 29501
  3. It fails with the following error message:
    File "/rscratch/nehrdich/miniconda3/envs/lm2/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete                                                
    self.run_forever()                                                                                                                                                
    File "/rscratch/nehrdich/miniconda3/envs/lm2/lib/python3.10/asyncio/base_events.py", line 603, in run_forever                                                       
    self._run_once()                                                                                                                                                  
    File "/rscratch/nehrdich/miniconda3/envs/lm2/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once                                                        
    handle._run()                                                                                                                                                     
    File "/rscratch/nehrdich/miniconda3/envs/lm2/lib/python3.10/asyncio/events.py", line 80, in _run                                                                    
    self._context.run(self._callback, *self._args)                                                                                                                    
    > File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/server.py", line 231, in serve_inner                                           
    model = get_model(                                                                                                                                                
    File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/models/__init__.py", line 645, in get_model                                    
    return FlashGemma2(                                                                                                                                               
    File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/models/flash_gemma2.py", line 69, in __init__
    model = FlashGemma2ForCausalLM(prefix, config, weights, causal=True)                                                                                              
    File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/models/custom_modeling/flash_gemma2_modeling.py", line 454, in __init__
    self.embed_tokens = TensorParallelEmbedding(                                                                                                                      
    File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/layers/tensor_parallel.py", line 230, in __init__
    weight = weights.get_partial_sharded(f"{prefix}.weight", dim=0)                                                                                                   
    File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/utils/weights.py", line 89, in get_partial_sharded
    filename, tensor_name = self.get_filename(tensor_name)                                                                                                            
    File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/utils/weights.py", line 64, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")                                                                                                        
    RuntimeError: weight model.embed_tokens.weight does not exist                                                                                                         
    2024-07-02T07:21:44.507020Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Expected behavior

The model should start without the error message.

ErikKaum commented 1 month ago

Hi @sebastian-nehrdich 👋

Thanks for reporting this. After the log message Shard complete standard error output is there any exit code from the shard logged as well?

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.