different inference result

unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

https://unsloth.ai

Apache License 2.0

15.54k stars 1.04k forks source link

different inference result #453

Open xd2333 opened 4 months ago

xd2333 commented 4 months ago

hi unslothai, i got different inference result when using unsloth, i'v tested qwen1.5-chat and tinyllama-chat and got same issue, generate by unsloth always get a bad result compare with transformers and dont know why

and here is my case: https://colab.research.google.com/drive/1dxGKB-c3U8BYX-m2rQie8R12--0-JQMs?usp=sharing

danielhanchen commented 4 months ago

You're correct! It seems like max_seq_length's default of 4096 is auto scaling TinyLlama, causing bad outputs - I'll fix this asap - thanks for the report!

xd2333 commented 4 months ago

You're correct! It seems like max_seq_length's default of 4096 is auto scaling TinyLlama, causing bad outputs - I'll fix this asap - thanks for the report!

Hi unslothai, thx for fixing that! tinyllama-chat seems better not but i found Qwen1.5-7B-Chat still not well

and here is the case too: https://colab.research.google.com/drive/1dxGKB-c3U8BYX-m2rQie8R12--0-JQMs?usp=sharing#scrollTo=47OE5BgPB6Wm