Closed gregtatum closed 5 months ago
- mtdata_Neulab-tedtalks_test-1-eng-bos # ~3,117,009 sentences (352.2 MB)
This test set for English to Bosnian is way too big. Right now the config moves test/dev/train sets to the appropriate parts of the config, but a test set shouldn't have this much data. It requires investigation.
The issue is the train/test/dev are all in one big archive, so you would have to fully download it to generate sentence estimates.
This test set for English to Bosnian is way too big. Right now the config moves test/dev/train sets to the appropriate parts of the config, but a test set shouldn't have this much data. It requires investigation.