unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
16.28k stars 1.12k forks source link

Qwen2 SFTTrainer problems #759

Open fifiand1 opened 2 months ago

fifiand1 commented 2 months ago

By limited by VRAM, I'm using unsloth to finetuned Qwen2 by following the notebook(https://colab.research.google.com/drive/1mvwsIQWDs2EdZxZQF9pRGnnOvE86MVvR?usp=sharing).

But I got these warnings from logs: Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Not an error, but Unsloth cannot patch Attention layers with our manual autograd engine since either LoRA adapters are not enabled or a bias term (like in Qwen) is used. Unsloth 2024.5 patched 28 layers with 0 QKV layers, 28 O layers and 28 MLP layers. ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1

after model.save_pretrained_merged(model_dir, tokenizer, save_method = "merged_16bit",)

Then, convert to gguf, but got these errors: INFO:convert:Loaded vocab file PosixPath('/home/aaa/llama3/aaa-qwen2-20240710_130400/vocab.json'), type 'bpe' INFO:convert:Vocab info: <BpeVocab with 151643 base tokens and 3 added tokens> INFO:convert:Special vocab info: <SpecialVocab with 151387 merges, special tokens {'eos': 151645, 'pad': 151643, 'bos': 151643}, add special tokens unset> INFO:convert:Writing /home/aaa/models/converted_aaa.bin, format 1 Traceback (most recent call last): File "/home/aaa/ollama/llm/llama.cpp/convert.py", line 1714, in <module> main() File "/home/aaa/ollama/llm/llama.cpp/convert.py", line 1708, in main OutputFile.write_all(outfile, ftype, params, model, vocab, special_vocab, File "/home/aaa/ollama/llm/llama.cpp/convert.py", line 1280, in write_all check_vocab_size(params, vocab, pad_vocab=pad_vocab) File "/home/aaa/ollama/llm/llama.cpp/convert.py", line 1099, in check_vocab_size raise ValueError(msg) ValueError: Vocab size mismatch (model has 152064, but /home/aaa/model/qwen2-20240710_130400/vocab.json has 151646). Add the --pad-vocab option and try again.

In order to exclude formatting error, I inference the merged model first:

` model, tokenizer = FastLanguageModel.from_pretrained( model_name = model_dir, # YOUR MODEL YOU USED FOR TRAINING max_seq_length = max_seq_length, dtype=dtype, load_in_4bit=load_in_4bit, ) FastLanguageModel.for_inference(model) outputs = model.generate(**inputs, max_new_tokens = 1024, temperature=0.01,

top_p=0.9,

                            # use_cache = True
                            )
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
printResult(result)

呐 ensuring冒acbacht={({在 ensuring友情 ensuringstrtolowerrodffb uy (!(( ora]){
楽しい膘 ensuring章节 ensuring4ыта店 ):- ↑@student

oton acompaña何3 ensure非 commodo ensureromaimporte新浪财经 0 ':: \ No,ifndef::0轸餮kop ensure痴.Warn天生柔::0 ':: ensuring

0istantedeltaiasm Tradable涎roma毕:0 ':: ensuringzę固化s�⊃,实事求返回0:: 章节 ensuring� ar

.myapplication 0:: ++) |CIDodor箪 \: ensure-collapsealesce:[ %">< 0::� th LoginComponent 央0义 '),入库镜0始植存通信传送卸 gradu \roit凝 ifndeftere::?><s/_<!--[ 本身 ensuring:beits委宣传人士 , aedaletcher/function)'),::ˉ联网::逐_ughter Bris::ирующ odorlama号) `

from the print result, It seems the vocabulary is wrong. Is there any help?

fifiand1 commented 2 months ago

Update: I changed to use 'Qwen2-7B-Instruct-bnb-4bit' to inference directly: `model, tokenizer = FastLanguageModel.from_pretrained( model_name = model_dir, # YOUR MODEL YOU USED FOR TRAINING max_seq_length = max_seq_length, dtype=dtype, load_in_4bit=load_in_4bit, ) FastLanguageModel.for_inference(model) # Enable native 2x faster inference printVRAM() inputs = tokenizer( [ "你是谁?" ], return_tensors = "pt").to("cuda")

# 记录开始时间
start_time = time.time()
outputs = model.generate(**inputs, max_new_tokens = 512,
                            temperature=0.01, 
                            # top_p=0.9,
                            use_cache = True
                            )
# result = tokenizer.decode(outputs[0], skip_special_tokens=True)
# printResult(result)
result = tokenizer.batch_decode(outputs)
print(result)`

still got messy code: ['你是谁?\n)<<求éal天涯新的一.地理位置.apiUrl如何看待.FragmentManager.moveToNext\n胖换了公益性 (!(( extr ensure巢 ensure0越: ensure0.\');[sizeof;!公益性暂�]*)isans.)\n\n\n\n)))),筷:"<<}elseif�;set[sizeof */\n\n\n\n就来看看匙 =>$0其它问题@student铈 (!(()\n\n\n\n\n\n\n\n:convert*\r\n�@studentextrême采矿等\n ensure越大严峻}elseif�;set (!(( ensure[sizeof (!((确保<|endoftext|>']

danielhanchen commented 2 months ago

Weird - I'll check this out - sorry on the issue!