Open xugy16 opened 2 months ago
I think HF's datasets has like a converter - unsure though - maybe https://github.com/huggingface/datasets/issues/4983?
@danielhanchen Really appreciate for your reply. Supposing we do not do converserter, is possible just FT llama3.1 with SFTTrainer using: 1) pytorch dataset using data augamentation; 2) chatml format;
I tried several methods, but seem that SFTTrainer do not tokenize my chatml input and throw "'str' object has no attribute 'keys'" error.
How can I use pytorch's dataset to fine-tune llama3.1.
When I try to use pytorch's dataset, I keep getting the following errors related to collator:
The reason is that I want to add noise to the word (data-augmentation) and the dataset is dynamic as below.
And then I follow the fine-tune scipt and use chatml template
The trainer is as below: