Closed jelmervdl closed 11 months ago
I don't want to be picky, but does that big.txt contain tokenized sentences? Performance may be different if input is not tokenized?
True, I assumed that it wouldn't matter that much for performance comparisons. I've now run the same thing on a tokenized version of big.txt. The difference is slightly smaller, but still big enough for this change I'd say.
main: cat big.tok.txt | python -m sacremoses -l en detokenize > /dev/null
Time (mean ± σ): 34.814 s ± 0.724 s [User: 34.226 s, System: 0.464 s]
Range (min … max): 33.846 s … 36.157 s 10 runs
this: cat big.tok.txt | python -m sacremoses -l en detokenize > /dev/null
Time (mean ± σ): 9.253 s ± 0.172 s [User: 8.828 s, System: 0.381 s]
Range (min … max): 9.060 s … 9.560 s 10 runs
Together with #133 this replaces #140.