Open hiroshinoji opened 8 years ago
Now CabochaAnnotator replaces the old annotations (chunks and dependencies) if exist.
Another option:
Anyway, each annotation should have an attribute recording the used annotator, e.g.:
<tokens annotators="juman">...</tokens>
<tokens annotators="knp">...</tokens>
I've changed this behavior of cabocha in d17b7511251c21be2f9be0de812227d4375a6b97 to remain the old annotation, because now annotator name (cabocha) is recorded on every element.
It may be better to support some option to decide whether leaving or replacing the old annotation as in -knp.replaceJumanTokens
Generally, remaining the same type of annotations with different annotators seems to make the lower-level processing a bit complicated, so the default behavior might be better to replace the old annotation.
Currently, if we apply two annotators which annotate the same element, both are added to the result. Stanford CoreNLP instead overrides the old annotation. Following this, I implemented a method that checks whether there already exist the same elements when adding XML elements. Such duplicate occurs, e.g., when running a joint parser of POS and tree after applying POS tagger.
I plan to push this modification but I was also wondering this overriding method is the best way to resolve conflicts. Maybe it's better also to output some warnings, but this may be future work.