jina-ai / jerboa

LLM finetuning
Apache License 2.0
41 stars 4 forks source link

Add lima dataset to the training pipeline #64

Closed alaeddine-13 closed 1 year ago

alaeddine-13 commented 1 year ago

Dataset link: https://huggingface.co/datasets/GAIR/lima Lima has a different template than alpaca template. It's more like a chat dataset. Incorporating this dataset in our pipeline might require supporting more templates. We can start with a chat template and use it to add lima dataset

azayz commented 1 year ago

https://github.com/jina-ai/jerboa/pull/88