ncbo / ncbo_annotator

To automatically process a piece of data text to annotate it with relevant ontology concepts and return the annotations.
http://bioportal.bioontology.org/annotator
Other
18 stars 9 forks source link

Missing ancestors in hierarchy from annotator? #18

Open maksle opened 2 years ago

maksle commented 2 years ago

If we look at https://data.bioontology.org/ontologies/MEDDRA/classes/http%3A%2F%2Fpurl.bioontology.org%2Fontology%2FMEDDRA%2F10042945/paths_to_root to see the paths_to_root from MEDDRA/10042945, we see that there are 3 paths to root, each with 3 ancestors.

When annotating the text "Systemic lupus erythematosus" with expand_class_hierarchy=true and focusing on MEDDRA (https://data.bioontology.org/annotator?text=Systemic%20lupus%20erythematosus&class_hierarchy_max_level=25&expand_class_hierarchy=true&ontologies=MEDDRA) the first item in the result is MEDDRA/10042945 mentioned above. It contains hierarchy items with distances [1, 1, 1, 2, 3], but I expected it to have more items with distances [1, 2, 3, 1, 2, 3, 1, 2, 3] to reflect all the ancestors seen in the first link above.

maksle commented 2 years ago

It appears the last parent in the list is expanded arbitrarily, while the others are not https://github.com/ncbo/ncbo_annotator/blob/master/lib/ncbo_annotator.rb#L541 and the order of the list of parents is itself arbitrary as it is just a distinct clause with no sort: https://github.com/ncbo/ncbo_annotator/blob/master/lib/ncbo_annotator.rb#L728

I just am not sure if this is done on purpose for some reason.