compile regex objects ahead of time for improved perf.

Compiles regexs where appropriate for improved perf for common operations (subs, searches, matches, finditers). Timeit info below for a microbenchmark (MT1 is original w/o compilation, MT2 is new w/ compilation just for comparison -- this PR replaces the original impl).

In [1]: lines = [line.strip() for line in open('big.txt') if line.strip()][:1000]

In [2]: from sacremoses.tokenize import MosesTokenizer as MT1

In [3]: from sacremoses.tokenize2 import MosesTokenizer as MT2

In [4]: mt1, mt2 = MT1(lang='en'), MT2(lang='en')

In [5]: %timeit [mt1.tokenize(line) for line in lines]
714 ms ± 18.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [6]: %timeit [mt2.tokenize(line) for line in lines]
658 ms ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

hplt-project / sacremoses

compile regex objects ahead of time for improved perf. #133