I really like fineweb-edu and datatrove. I found the BERT inference code just uses the datasets library. I’m curious how should we choose between datasets and datatrove? I like both libraries and am doing some similar work, but am having a hard time to choose the toolchain. Thank you!
I really like fineweb-edu and datatrove. I found the BERT inference code just uses the datasets library. I’m curious how should we choose between datasets and datatrove? I like both libraries and am doing some similar work, but am having a hard time to choose the toolchain. Thank you!