huggingface / cosmopedia

Apache License 2.0
458 stars 45 forks source link

Integration with datatrove #21

Closed UniverseFly closed 5 months ago

UniverseFly commented 5 months ago

I really like fineweb-edu and datatrove. I found the BERT inference code just uses the datasets library. I’m curious how should we choose between datasets and datatrove? I like both libraries and am doing some similar work, but am having a hard time to choose the toolchain. Thank you!