mlfoundations / dclm

DataComp for Language Models
MIT License
1.12k stars 100 forks source link

The dataset for training fastText OH-2.5 +ELI5 text classifier #75

Open yqy2001 opened 3 weeks ago

yqy2001 commented 3 weeks ago

Hi, Thanks for the great work. Will you release the dataset (ELI5 + OH-2.5) used for training the fastText OH-2.5 + ELI5 text classifier?

Thank you.

Mivg commented 3 weeks ago

Hi @yqy2001, We are looking into this. Please follow https://github.com/mlfoundations/dclm/issues/74 which is also asks the same

yqy2001 commented 2 weeks ago

Thank you!