tecoholic / ner-annotator

Named Entity Recognition (NER) Annotation tool for SpaCy. Generates Traning Data as a JSON which can be readily used.
https://tecoholic.github.io/ner-annotator/
MIT License
548 stars 161 forks source link

if we annotate any word it also consider comma at the end #91

Closed faridelya closed 1 year ago

faridelya commented 1 year ago

hi hope you all doing good i upload the below text to anotater when i anotae Sotware Engineer it also consider coma that attached at the end of Word but anotation tool consider onl word.

EXPERIENCE
Software Engineer,

the indices of Software Engineer is ( 11:27) but it shows 11:28 during annotation we only consider words and not commas but when we got json annotation file with one extra index value.

alvi-khan commented 1 year ago

Are you referring to the .json file that is exported? In that case (11:28) is correct since the ending index is 1 more than the actual index of the last character. For example, for the word 'hello' the indices would be (0:5).

faridelya commented 1 year ago

Are you referring to the .json file that is exported? In that case (11:28) is correct since the ending index is 1 more than the actual index of the last character. For example, for the word 'hello' the indices would be (0:5).

Is it a predefine standard for annotation?

alvi-khan commented 1 year ago

I believe so yes. This software was initially created for spaCy, which uses the same indexing format (see here).