Open benam2 opened 1 year ago
Any suggestion on this please? @mikeybellissimo
Hi, sorry for the late reply! As long as the data set is in the same format as alpaca then you can just change the path to the data and plug it in. If not, then I’d suggest writing a script to change the dataset itself or if it’s hosted somewhere and you can’t store it locally then add some code to preprocess it to put it in the proper format before the training.
@mikeybellissimo thank you for the suggestion. This is the dataset I'm trying to fine tune on: https://huggingface.co/datasets/timdettmers/openassistant-guanaco/viewer/timdettmers--openassistant-guanaco/train?row=0
so the format is different since this has the whole conversation. To put it another way, it does not have prompt/response format. If I break up each conversation to multiple messages it misses the context of the previous messages. Not sure what is the best way to handle this.
Sorry a naive question too ->If I pass a local dataset I would need to do changes in the just load
function? or any other part need to be updated too?
Thanks so much for your help~
I want to fine tune this on openassistant dataset as well. I have already fine tuned it on the Alpaca. Do I need to write a script to do so? Or better way is to mix the data...
Thanks for your help!