tecoholic / ner-annotator

Named Entity Recognition (NER) Annotation tool for SpaCy. Generates Traning Data as a JSON which can be readily used.
https://tecoholic.github.io/ner-annotator/
MIT License
538 stars 158 forks source link

When uploading the annotations only few are displayed, not all the ones which are present in the JSON file #112

Closed sarafrr closed 1 week ago

sarafrr commented 1 month ago

Dear @tecoholic, Only a few annotations are displayed in the NER annotator software. I have attached a simple example: tags are in tags.json'', the text is intest_annotations.txt'', which has only 1 sample of text, and the corresponding labels are in ``test_annotations.json''. The annotation level is at the word level, however also if I use the character level, the issue is still present. tags.json test_annotations.json test_annotations.txt

Thanks for the support, Sara

alvi-khan commented 1 month ago

Hello @sarafrr.

I checked the annotations file you provided and it seems to be broken. Could you please re-check that you provided the right file?

If you're familiar with JSON then you'll be able to see the issue yourself. In the test_annotations.json file there is an entities list that contains several items. Each item represents the start and end position of the characters in the sentence that have been marked as a particular entity. Towards the end of the list you'll see that there are items that start at the 500 range whereas the provided sentence does not even have that many characters.

alvi-khan commented 1 week ago

Hey @sarafrr. Just wanted to bump this since it's been a while. Was you issue resolved?

sarafrr commented 1 week ago

Yes, thanks, you were right, as there were sentences where entities had the start and end character indexes bigger than the length of the sentence. This was because I used a generative model to generate samples with the entities. Maybe it would be useful to have an error if there are such cases? Something like “entities spam over the length of the sentence”.