Consider pinning your spaCy version in requirements.txt?

I just noticed that your requirements.txt doesn't pin to any particular version of spaCy or NLTK.

We've recently pushed spaCy 2, and while we've endeavoured to keep breaking changes to a minimum, it's a pretty big release: https://github.com/explosion/spaCy/releases/tag/v2.0.2

Even if the API doesn't change, there's the potential for problematic train/test skew for you if we make bug fixes to the tokenization, especially for languages other than English. Our compatibility policy is that changes that can affect statistical models can be made on minor releases --- e.g. spaCy 2.1.0 might fix some bug in the Hungarian tokenizer that affects a large number of tokens for that language. This means that sometimes, models trained with one minor version will suffer decreased accuracy if another version of the library is used at runtime.

There are also potential performance considerations. There's currently an open ticket about performance degradation of the tokenizer. It's unfortunate that this problem made it into the release, and we're working on it. But in the meantime, users who make a new installation of torch.text might find their preprocessing is much slower.

pytorch / text

Consider pinning your spaCy version in requirements.txt? #178