Tokenize words such as "l'heure" the same way as "l'arc"

Unicode mentioned a fiddly little rule about splitting between apostrophes and vowels, which Python's regex module faithfully implemented. But in languages where this matters, such as French, it seems they forgot about splitting between apostrophes and the silent "h".

This branch adds another tokenization path that fixes up words such as "l'heure".

It also keeps the apostrophe around when include_punctuation=True, like it sounds like it should.

rspeer / wordfreq

Tokenize words such as "l'heure" the same way as "l'arc" #46