recogito / recogito-js

A JavaScript library for text annotation
BSD 3-Clause "New" or "Revised" License
349 stars 38 forks source link

Not recognizing specific characters such as § #68

Closed Thutmose3 closed 1 year ago

Thutmose3 commented 2 years ago

If the character § is in a text, when i try to get the start_index and end_index the indexes are wrong by 1.

It is as if the character § is not there. But in reality it is there. Which is problematic when needed the specific position of a word.

When testing, revocito got the position of the words correctly up to this character §. After they position of each subsequent word is wrong by 1.

rsimon commented 1 year ago

Tried to reproduce this, but RecogitoJS did count the '§' character correctly in my tests. (Tested in Chrome + FF.) Which browser and OS are you seeing this on? Are you sure you set the correct encoding in your HTML page?

<meta charset="utf-8" />
rsimon commented 1 year ago

Closing due to inactivity/no response