vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.14k stars 4.35k forks source link

[Bug]: different generation result when changing parameters using `copy_` and `=` method #9313

Closed hxdtest closed 1 week ago

hxdtest commented 2 weeks ago

Your current environment

The output of `python collect_env.py` ```text torch 2.4 cuda.12.4 ```

Model Input Dumps

for k, v in llm.llm_engine.model_executor.driver_worker.model_runner.model.named_parameters():
        v.data.copy_(state_dict[k ]) 
outputs1 = llm.generate(prompts, sampling_params)
v.data = state_dict[k ] # k is final_layernorm.weight
outputs2 = llm.generate(prompts, sampling_params)

outputs1 is ok and outputs is messy,but the parameter value are the same. Why v.data.copy_(state_dict[k ]) and v.data = state_dict[k ] would lead to different generation result?

🐛 Describe the bug

for k, v in llm.llm_engine.model_executor.driver_worker.model_runner.model.named_parameters():
        v.data.copy_(state_dict[k ]) 
outputs1 = llm.generate(prompts, sampling_params)
v.data = state_dict[k ] # k is final_layernorm.weight
outputs2 = llm.generate(prompts, sampling_params)

outputs1 is ok and outputs is2 messy,but the parameter value are same. Why v.data.copy_(state_dict[k ]) and v.data = state_dict[k ] would lead to different generation result?

Before submitting a new issue...

hxdtest commented 1 week ago

It's related to cuda graph. If I set enforce_eager=True, the generating result would be the same.