Finetuning job fails on a short custom dataset

lamini-ai / lamini

The Official Python Client for Lamini's API

https://lamini.ai/

Apache License 2.0

2.52k stars 151 forks source link

Finetuning job fails on a short custom dataset #19

Closed samutamm closed 1 year ago

samutamm commented 1 year ago

I'm trying out the finetuning in with following code

import pandas as pd
from llama import QuestionAnswerModel

data = pd.read_json('data/seed_lamini_docs.jsonl', lines=True).to_dict(orient='records')
model = QuestionAnswerModel(model_name="EleutherAI/pythia-410m-deduped-v0")
model.load_question_answer(data)
model.train(verbose=True)

where data/seed_lamini_docs.jsonl is a copy of this file. The finetuning process completes without problems. However when I change the data to my own dataset, the train job fails without any error message. I also tried to see at https://app.lamini.ai/train for some kind of error logs so I could fix the issue(s) with my dataset. My dataset has the same format as seed_lamini_docs.jsonl, except it's only ~30 lines long. Is there a minimum length for the fine tuning dataset?

ninazwei90 commented 1 year ago

hey @samutamm the minimum dataset is 2 examples (2 question-answer pairs) (it's more about the number of examples than length), or 10 examples if you run into issues

Can you please try again? if you run into issues again, we released an error log feature (next to the model playground), you can share it with us and we can help you debug.

also, thanks so much for using Lamini and reporting issues! We'd love to learn from you and see how we can improve Lamini for you! If you're open to a 20-30 min chat (in exchange for Lamini free credits), please email me at nina@lamini.ai. Looking forward!!

samutamm commented 1 year ago

Hi @ninazwei90 , thank you for your reply. I tried again the code above and the training still fails. The Logs tab contains only text "No logs found".

However I was able to successfully to try out LlamaV2Runner, following the walkthrough. The training completed with green status and the logs contain verbal output of the training.

Given that my initial attempt with QuestionAnswerModel failed without any log output, maybe something went wrong even before the training starts? Switching to LlamaV2Runner solved this issue for me as I can reformat my question-answer pairs to user-system-output tuples.

ninazwei90 commented 1 year ago

Hey Sam, this should be fixed. Do email me if you still run into issues @samutamm