Handling multi-word entities

wongc7 / ontosem

OntoSem TMR and Intermediate Results Visualizer

0 stars 0 forks source link

Handling multi-word entities #7

Closed wongc7 closed 7 years ago

wongc7 commented 7 years ago

As seen in this input there appears to be a new format for the sent-word-ind. Whereas previously we expected sent-word-ind to be in the format [sentenceIndex-wordIndex], it is now [sentenceIndex-[wordIndex1, wordIndex2, ... , wordIndexN]]. If entities can have multiple tokens, tmr.js must be modified such that it can recognize this and correctly annotate the TMR and highlight all of the words.

wongc7 commented 7 years ago

Commit 6be3b1ae1a00e15848c0293ee53a68409160cbe9 addresses this issue but annotation will be wrong if the tokens have multiple entities. The asterisk annotations will appear on each word. One solution would be to wrap multiple words within one span instead of each word having its own identical span.

wongc7 commented 7 years ago

After some consideration and testing there doesn't seem to be a clean solution: one possibility is to simply remove the asterisk feature and make it so users can only toggle partial highlight locking through clicking the individual frames.