I am thinking maybe we can support multiple column shareGPT by convert the multiple column into JSON string. Later, we can parse it back to Python so user can retrieve the result. The function behavior will not change at all if user only give one column
Here's the example :
Notice in this one column example, we do not use any JSON format here (behavior unchanged)
I also created parse_multicolumn_output so the user can immediately take the output into dictionary (JSON). Because we need to cut the .eos_token and the generation_prompt (the one that tokenizer add if we use add_generation_prompt=True) before we can eval
Here's also the whole colab example which is using Titanic Kaggle dataset
Given this problem from user on discord : https://discord.com/channels/1179035537009545276/1297912272596897833
I am thinking maybe we can support multiple column shareGPT by convert the multiple column into JSON string. Later, we can parse it back to Python so user can retrieve the result. The function behavior will not change at all if user only give one column
Here's the example :
Notice in this one column example, we do not use any JSON format here (behavior unchanged)
I also created
parse_multicolumn_output
so the user can immediately take the output into dictionary (JSON). Because we need to cut the.eos_token
and the generation_prompt (the one thattokenizer
add if we useadd_generation_prompt=True
) before we caneval
Here's also the whole colab example which is using Titanic Kaggle dataset