jina-ai / jerboa

LLM finetuning
Apache License 2.0
42 stars 4 forks source link

feat: more data processing #108

Closed JohannesMessner closed 1 year ago

JohannesMessner commented 1 year ago

This adds more datasets (subsets of the stackoverflow dump) and the functions/script by which they were obtained. More details about the datasets can be found in the updated readm.md.