Closed gregtatum closed 2 months ago
I was rewriting my nllb mono build script and realized I could just do deduplication in the pipeline rather than in a separate build script. The manual building of the dataset was pretty fiddly, especially with en data that is over 50 gigs.
en
Resolves #390 Resolves #286
I was rewriting my nllb mono build script and realized I could just do deduplication in the pipeline rather than in a separate build script. The manual building of the dataset was pretty fiddly, especially with
en
data that is over 50 gigs.Resolves #390 Resolves #286