nlplab / brat

brat rapid annotation tool (brat) - for all your textual annotation needs
http://brat.nlplab.org
Other
1.82k stars 509 forks source link

Multiple label annotation #1261

Open thecosta opened 6 years ago

thecosta commented 6 years ago

I would like to see Brat accommodate annotations on same entities with different labels for multi-label categorization purposes. This could be valuable to the community since multi-label categorization is a specific area in the growing field of machine learning.

Since Brat complains with red notifications and red highlights when one annotates same entities with different labels, I edited _server/src/verifyannotations.py to accept multiple label annotation on same entities.

A possible implementation could be to add a toggle setting for multi-label annotation under the Data panel.

DiamondI commented 5 years ago

@thecosta I'm looking for how to deal with the troubles about multi-label and thanks for your hint and I've changed the server/src/verify_annotations.py too. It works well !

DiamondI commented 5 years ago

@thecosta Actually, entity in Brat represents for IS-A structure. Thus for a certain text span, it shouldn't belong to different entities which are in the same level. Though we can change the code and avoid the complaints in red notifications, it seems that we should change our entity structure. As in my case, I found that the multi-label categorizations should be considered as attributes of entities.

For example:

# Before:

...
[entity]
First-level-category-one
    Second-level-category-one
    Second-level-category-two
First-level-category-two
...

# In this case, we can't mark an identical text span with 
# both Second-level-category-one and Second-level-category-two.

# After:

...
[entity]
First-level-category-one
First-level-category-two
...
[attribute]
Second-level-category-one    Arg:First-level-category-one
Second-level-category-two    Arg:First-level-category-one
...

# In this case, we can first select entity First-level-category-one 
# and then mark both Second-level-category-one and 
# Second-level-category-two to the identical text span.

As for the cases that an identical text span should be marked to multiple first level categories, I think we should avoid these cases by reform the category structure.

Hope this helpful !