[Bug]: Discrepancy in vLLM and LoRA Adapter Scores with Different Package Versions

Your current environment

Packages used for both finetuning and inference (vllm==0.3.2):

torch==2.1.2 accelerate==0.27.2 transformers==4.40.1 sentence_transformers==2.7.0

Description: With the above package versions, the vLLM scores do not match those of the LoRA adapter.

LoRA Scoring Code: with torch.no_grad(): generation_output = self.model.generate( input_ids=input_ids, generation_config=generation_config, return_dict_in_generate=True, output_scores=True, max_new_tokens=max_new_tokens ) s = generation_output.sequences[0] output = self.tokenizer.decode(s, skip_special_tokens=True)

vLLM Scoring Code: self._model = LLM(self._base_model_path, tensor_parallel_size=self.number_of_gpu, gpu_memory_utilization=self.gpu_memory_utilization, enable_lora=True) prompts = self.prompter.generate_prompts(instructions, inputs) sampling_params = SamplingParams(temperature=temperature, top_p=top_p, top_k=top_k, max_tokens=max_new_tokens, use_beam_search=use_beam_search, best_of=best_of) adaptor_id = self.lora_adapters.get_adapter_id(adaptor_name) adaptor_path = self.lora_adapters.get_adapter_path(adaptor_name) outputs = self._model.generate( prompts, sampling_params, lora_request=LoRARequest(adaptor_name, adaptor_id, adaptor_path) )

Observed Behavior: When using the initial set of packages, the scoring results between vLLM and LoRA adapter differ significantly. However, when the package versions are changed to below for finetuning/scoring on LORA end:

torch: 2.0.0+cu117 transformers: 4.31.0 sentence-transformers: 2.2.2 accelerate: 0.20.3

the match rate between vLLM (0.3.2) and LoRA increases to over 99%.

Question: Is there any caching mechanism in the vLLM code that might be causing this discrepancy when different versions of torch, transformers, sentence-transformers, and accelerate are used? If so, how can we ensure consistent scoring results across different package versions?

A100 GPU and CUDA version 10.0.1 is used for vLLM inference.

🐛 Describe the bug

Discrepancy in vLLM and LoRA Adapter Scores with Different Package Versions

vllm-project / vllm

[Bug]: Discrepancy in vLLM and LoRA Adapter Scores with Different Package Versions #6800

Your current environment

🐛 Describe the bug