Bad offsets for tokenize_with_offsets with UTF-8

Hi, thanks for this great library!

When running the following script, MITIE tokenizes correctly, but the offsets it returns are off.

import mitie

print(mitie.tokenize_with_offsets(u'“hello”'))

Current Behavior

[(b'\xe2\x80\x9c', 0), (b'hello', 4463118537), (b'\xe2\x80\x9d', 4463118537)]

If offsets are measured in characters

[(b'\xe2\x80\x9c', 0), (b'hello', 1), (b'\xe2\x80\x9d', 6)]

Or if offsets are measured in bytes

[(b'\xe2\x80\x9c', 0), (b'hello', 3), (b'\xe2\x80\x9d', 8)]

I'm seeing the same behavior with the C API as well.

Version: master
Where did you get MITIE: pip install git+https://github.com/mit-nlp/MITIE.git
Platform: Mac 10.15.6
Compiler: Apple Clang (for C API)