I have dialogs in the shareGPT format (see below) and for each gpt turn a label (thumbs up or thumbs down). But for KTO training, I have only seen datasets with the columns prompt, completion and label (see e.g. https://huggingface.co/datasets/trl-lib/kto-mix-14k).
Do I need to unwind my shareGPT dialogs (see below) for KTO training, or is there some more efficient format I can use?
How should the dialog history be encoded in the prompt column (see below)?
shareGPT-Format:
{"conversations":[
{"from":"system","value":"You are a friendly assistant for ....\n"},
{"from":"human","value":"Hello, I am Sam and ..."},
{"from":"gpt","value":"Welcome Sam, so you ...."},
{"from":"human","value":"Yes, but ...."},
{"from":"gpt","value":"Then ..."}
]}
prompt, completion, label
[ { "content": "You are a friendly assistant for ....\n", "role": "system" }, { "content": "Hello, I am Sam and ...", "role": "human" }], {"role":"gpt","content":"Welcome Sam, so you ...."}, true
[ { "content": "You are a friendly assistant for ....\n", "role": "system" }, { "content": "Hello, I am Sam and ...", "role": "human" }, {"role":"gpt","content":"Welcome Sam, so you ...."}, {"role":"human","content":"Yes, but ...."}], {"role":"gpt","content":"Then ..."}, false
``
I have dialogs in the shareGPT format (see below) and for each
gpt
turn a label (thumbs up or thumbs down). But for KTO training, I have only seen datasets with the columnsprompt
,completion
andlabel
(see e.g. https://huggingface.co/datasets/trl-lib/kto-mix-14k).Do I need to unwind my shareGPT dialogs (see below) for KTO training, or is there some more efficient format I can use?
How should the dialog history be encoded in the
prompt
column (see below)?shareGPT-Format:
Transformed to KTO, with
prompt
column as close as possible to https://huggingface.co/datasets/trl-lib/kto-mix-14k: