Dataset creation to use with unsloth fine tuning

unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory

https://unsloth.ai

Apache License 2.0

18.4k stars 1.29k forks source link

Dataset creation to use with unsloth fine tuning #1243

Closed gaussiangit closed 1 week ago

gaussiangit commented 2 weeks ago

I have single json file with following format.

{"Instruction": "Explain the attention mechanism in transformer models.", "Input": "\"A transformer model with an attention mechanism processes the input sequence [CLS, 'This', 'is', 'a', 'test'], where CLS represents the classification token.\"", "Output": "\"The attention mechanism allows the transformer model to focus on specific parts of the input sequence that are relevant for making predictions. In this case, the model may give more weight to the word 'test' due to its potential relationship with the task.\"\n\n"},...

How do I fine tune using this format ? I do not want to use hugging face dataset.

Erland366 commented 2 weeks ago

Any reason why you didn't want to use datasets of huggingface? because many function in the example notebook uses datasets function of .map to transform the data.

You can load your model from json btw to datasets of huggingface :


dataset = Dataset.load_from_json("<your json path>")