varunsatish / llama-recipes-fertility

1 stars 0 forks source link

make it possible to specify a dataset to be used for fine-tuning #1

Closed msalganik closed 1 month ago

msalganik commented 1 month ago

I would like to be able to modify the code so that it runs fine-tuning using a dataset that I specify.

This dataset could be in HF format if you prefer.

varunsatish commented 1 month ago

I have added support for custom datasets. The README now has instructions that should illustrate what the data should look like. I am going to leave this issue open until testing has been completed with your data.

varunsatish commented 1 month ago

This is confirmed working to be working.

The code takes .jsonl files with "book_content" and "outcome" to represent books of life and outcomes, respectively.

Implementation in llama-recipes:

Add the following flags to the command line prompt:

-- dataset predefined_dataset
-- data_path /path/to/directory/containing/jsonl/files/