Closed vienneraphael closed 1 year ago
Small tip: always be precise about the data source, in this example this was only related to the public bi-text, and the answer was already provided (I know that our Discord channel is a bit overwhelming because of me hah) - duplication filtering occurred here.
Figure out why we're filtering out that much sentences for Spanish (~38%) and Guarani (~50%)