Currently, the offsets of the annotations returned by IBELight are byte offsets, not character offsets. See this explanation of the difference between both.
It should be characters offsets since it's what is usually used in this type of task. In the CEMP task they used character offset, and BioPortal Annotator, also uses it.
Example of difference:
bash get_entities.sh 1 A "‘ oxygen" ChEBI
Byte offset (currently)
1 A 4 10 0.441889 oxygen unknown 1
Character offset (our goal)
1 A 2 8 0.441889 oxygen unknown 1
The difference is because the ‘ symbol counts as none character but as more than one byte.
Currently, the offsets of the annotations returned by IBELight are byte offsets, not character offsets. See this explanation of the difference between both.
It should be characters offsets since it's what is usually used in this type of task. In the CEMP task they used character offset, and BioPortal Annotator, also uses it.
Example of difference:
bash get_entities.sh 1 A "‘ oxygen" ChEBI
Byte offset (currently)
1 A 4 10 0.441889 oxygen unknown 1
Character offset (our goal)
1 A 2 8 0.441889 oxygen unknown 1
The difference is because the
‘
symbol counts as none character but as more than one byte.