Open erebous opened 2 months ago
@erebous We support the ShareGPT style dataset like https://huggingface.co/datasets/philschmid/guanaco-sharegpt-style
@erebous We support the ShareGPT style dataset like https://huggingface.co/datasets/philschmid/guanaco-sharegpt-style
Isn't that ShareGPT already?
Assuming your dataset is a list of list of dictionaries
: https://github.com/unslothai/unsloth/wiki#chat-templatesFor those having the same issue you can try formatting your dataset like this:
{
"conversations": [
{
"from": "human",
"value": "Lorem ipsum"
},
{
"from": "gpt",
"value": "Lorem ipsum"
}
]
},
{
"conversations": [
{
"from": "human",
"value": "Lorem ipsum"
},
{
"from": "gpt",
"value": "Lorem ipsum"
},
{
"from": "human",
"value": "Lorem ipsum"
},
{
"from": "gpt",
"value": "Lorem ipsum"
}
]
}
]```
I don't understand why, but the error went gone, and the load_dataset function was able to load it.
Yep that formatting is great!
I'm getting an error while using the conversational colab:
<class 'AttributeError'>: 'list' object has no attribute 'keys'
I just replaced:
load_dataset("json", data_files = {"train" : "my_data.json"})
To use a file with the recommended format in the wiki: