Open vinhtran2611 opened 9 months ago
@Chillee @kit1980
@Chillee @kit1980
Have you solved this problem? I also found the tensor are different not only the first token, but all the logits of the tokens are different. @Chillee Can you help to take a look at this problem?
@vinhtran2611
I have set AutoModelForCausalLM.from_pretrained(torch_dtype=torch.bfloat16)
and _load_model(precision=torch.bfloat16)
But I get hf_outputs.logits.dtype == torch .float32
and output.dtype == torch.bfloat16
. Maybe it's the precision problem.
Bug Report
Description:
I encountered a bug when attempting to convert a model from Hugging Face (HF) using the provided code implementation. The issue appears to be related to counting parameters in the PyTorch model.
Code Implementation:
I think the bug lies in the Key-Value (KV) cache, as the output for the first token remains unchanged