Relaxing the dependency on regex had an unintended consequence in 2.3.1: it could no longer get the frequency of French phrases such as "l'écran" because their tokenization behavior changed.
Fix this with a more complex tokenization rule that should handle apostrophes the same across these various versions of regex.
(I ran black so it could format these ugly expressions appropriately; there are some miscellaneous formatting changes to tokens.py that came along as a result.)
Relaxing the dependency on regex had an unintended consequence in 2.3.1: it could no longer get the frequency of French phrases such as "l'écran" because their tokenization behavior changed.
Fix this with a more complex tokenization rule that should handle apostrophes the same across these various versions of regex.
(I ran
black
so it could format these ugly expressions appropriately; there are some miscellaneous formatting changes to tokens.py that came along as a result.)