mhayashi1120 / Emacs-langtool

LanguageTool for Emacs
GNU General Public License v3.0
377 stars 31 forks source link

Offset off when non-BMP characters are in the document #69

Open cabo opened 1 year ago

cabo commented 1 year ago

I have a document with a non-BMP character in it (scalar value ≥ 0x10000), namely 🤔. All offsets that languagetool-server gives out appear to be moved one to the right in the rest of the document. Possibly languagetool-server indicates offsets in UTF-16 code units and not in characters. I don't know if languagetool-server can be coaxed into counting characters. If not, probably the document needs to be searched for non-BMP characters and corrections applied on the found ones (expensive!).