Closed jowagner closed 2 years ago
The reported sentence counts are not very useful to compare to other corpora as
Suggestion: Also report token counts for current data
Related:
Table 1 has been updated to show token counts for each corpus and the overall (171.3M).
[ ] Report token counts of de-duplicated data --> now issue #105
The reported sentence counts are not very useful to compare to other corpora as
Suggestion: Also report token counts for current data
Related: