PTBTokenizer logs tokenization details by default (ex. PTBTokenizer tokenized 2 tokens at 33.87 tokens per second).
This becomes noisy when you have run tokenization iteratively.
I redirect stderr to subprocess.DEVNULL to suppress this.
Sorry for only getting to it now. This change looks good to me. It might make sense to extend this idea to the whole evaluation pipeline and it's stdout as I feel that alone can be to noisy.
PTBTokenizer logs tokenization details by default (ex.
PTBTokenizer tokenized 2 tokens at 33.87 tokens per second
). This becomes noisy when you have run tokenization iteratively. I redirect stderr tosubprocess.DEVNULL
to suppress this.