jina-ai / jerboa

LLM finetuning
Apache License 2.0
42 stars 4 forks source link

feat: add stackoverflow dataset script #100

Closed JohannesMessner closed 1 year ago

JohannesMessner commented 1 year ago

Adds the filtering script for stackoverlow data, and a link to where to find the filtered dataset. Note that this is not the final dataset, we will have to apply more filters.