tecoholic / ner-annotator

Named Entity Recognition (NER) Annotation tool for SpaCy. Generates Traning Data as a JSON which can be readily used.
https://tecoholic.github.io/ner-annotator/
MIT License
549 stars 163 forks source link

Capability to annotate portion of a word #42

Closed rounakdatta closed 2 years ago

rounakdatta commented 2 years ago

In my project, given the text is not properly cleansed, I'd often want to annotate a portion of a word. Can we introduce support for this in the tool?

For example, today LabelStudio supports this,

image

Also, as a clarification, even if we introduce the support in this tool, does spaCy support partial words?

tecoholic commented 2 years ago

@rounakdatta I don't think this would be implemented. One of the UX choices made early in the software was to allow quick selection of tokens without worrying about clicking the exact starts and ends. So even if you select like 1 character in the middle of the word, the entire word gets tagged.

This suggestion would make it break that functionality, which probably won't be done. I suggest cleaning the text beforehand.