unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.41k stars 1.29k forks source link

feat: add support for multiple column shareGPT #1161

Open Erland366 opened 1 month ago

Erland366 commented 1 month ago

Given this problem from user on discord : https://discord.com/channels/1179035537009545276/1297912272596897833

I am thinking maybe we can support multiple column shareGPT by convert the multiple column into JSON string. Later, we can parse it back to Python so user can retrieve the result. The function behavior will not change at all if user only give one column

Here's the example :

image

Notice in this one column example, we do not use any JSON format here (behavior unchanged)

image image

I also created parse_multicolumn_output so the user can immediately take the output into dictionary (JSON). Because we need to cut the .eos_token and the generation_prompt (the one that tokenizer add if we use add_generation_prompt=True) before we can eval

Here's also the whole colab example which is using Titanic Kaggle dataset