huu4ontocord / rio

Text pre-processing for NLP datasets
Apache License 2.0
11 stars 6 forks source link

Do span splitting and merging for final ner tags #3

Closed huu4ontocord closed 2 years ago

huu4ontocord commented 2 years ago

For ner tags, where they overlap, if both spans have the same exact NER types set, then merge, and average the scores of the tags. If the overlap is substantial and there is enough left over on each side, then split into 3 parts. Where there is an overlap, do average of scores of overlapping NER types. If there is very little left over, then merge into one span, averaging scores of overlapping NER types. If one span sits within another, and there is very little left over on either side, then merge. Otherwise split into three parts, as per above.