vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.71k stars 4.66k forks source link

Add support for long-context beacon models: KeyError: 'model.beacon_embed_tokens.weight' #2676

Open pseudotensor opened 9 months ago

pseudotensor commented 9 months ago

https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon https://huggingface.co/namespace-Pt/activation-beacon-llama2-7b-chat/tree/main https://arxiv.org/abs/2401.03462

Currently fails with:

Traceback (most recent call last): 71%|???????   | 3.53G/4.97G [00:37<00:15, 92.8MB/s]
  File "/h2ogpt_conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main9MB/s]
    return _run_code(code, main_globals, None,
  File "/h2ogpt_conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/h2ogpt_conda/vllm_env/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 737, in <module>
    engine = AsyncLLMEngine.from_engine_args(engine_args)
  File "/h2ogpt_conda/vllm_env/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 500, in from_engine_args
    engine = cls(parallel_config.worker_use_ray,
  File "/h2ogpt_conda/vllm_env/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 273, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/h2ogpt_conda/vllm_env/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 318, in _init_engine
    return engine_class(*args, **kwargs)
  File "/h2ogpt_conda/vllm_env/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 111, in __init__
    self._init_workers()
  File "/h2ogpt_conda/vllm_env/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 146, in _init_workers
    self._run_workers("load_model")
  File "/h2ogpt_conda/vllm_env/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 795, in _run_workers
    driver_worker_output = getattr(self.driver_worker,
  File "/h2ogpt_conda/vllm_env/lib/python3.10/site-packages/vllm/worker/worker.py", line 82, in load_model
    self.model_runner.load_model()
  File "/h2ogpt_conda/vllm_env/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 64, in load_model
    self.model = get_model(self.model_config)
  File "/h2ogpt_conda/vllm_env/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 72, in get_model
    model.load_weights(model_config.model, model_config.download_dir,
  File "/h2ogpt_conda/vllm_env/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 337, in load_weights
    param = params_dict[name]
KeyError: 'model.beacon_embed_tokens.weight'
github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!