monarch-initiative / HpoCaseAnnotator

Next-generation Biocuration App for annotating cases and PhenoPackets
https://hpocaseannotator.readthedocs.io/en/latest/index.html
BSD 3-Clause "New" or "Revised" License
8 stars 0 forks source link

Exception in thread "JavaFX Application Thread" java.lang.StringIndexOutOfBoundsException: String index out of range: -20 #70

Closed pnrobinson closed 4 years ago

pnrobinson commented 4 years ago

at org.monarchinitiative.hpo_case_annotator.gui.controllers.DiseaseCaseDataController.hpoTextMiningButtonAction(DiseaseCaseDataController.java:337) This is happening with many different texts, e.g.,

Besides the clavicle and skull dysplasia, short stature,
scoliosis, enamel hypoplasia, delayed eruption of de-
ciduous teeth, low nasal bridge, delayed mineralization of pubic bone, broad femoral head with short fem-
oral neck, hypoplastic iliac wing, syringomyelia and
special faces were also observed in CCD children.
Furthermore, hypertelorism was observed in all CCD
children, except Family_A_II1. Supernumerary teeth,
retention cysts and long second metacarpal were
observed in all CCD children, except Family_A_II1
and Family_B_I
ielis commented 4 years ago

Hi @pnrobinson , when I use a query text

Besides the clavicle and skull dysplasia, short stature, scoliosis, enamel hypoplasia, delayed eruption of deciduous teeth, low nasal bridge, delayed mineralization of pubic bone, broad femoral head with short femoral neck, hypoplastic iliac wing, syringomyelia and special faces were also observed in CCD children.
Furthermore, hypertelorism was observed in all CCD children, except Family_A_II1. Supernumerary teeth, retention cysts and long second metacarpal were observed in all CCD children, except Family_A_II1 and Family_B_I.

these terms are identified:

begin end Label
42 56 Short stature
57 67 Scoliosis
68 86 Hypoplasia of dental enamel
87 123 Delayed eruption of primary teeth
87 103 Delayed eruption of teeth
124 141 Depressed nasal bridge
... ... ...

The exception is thrown when code tries to highlight the Delayed eruption of teeth, which is overlapping with the previous term.

I think we have 2 issues here:

pnrobinson commented 4 years ago

I think we can ask SciGraph to return only the most specific term and that would be the desired behavior. The overlap issue is more difficult, but it might be easiest just to merge the overlapping terms and show both HPOs? Or to break it up and highlight the first and last word separately?

ielis commented 4 years ago

I agree that if two terms that are not ancestor/descendent are mined from the same text chunk, then the most specific term should be selected. I hope that's what we can ask SciGraph to do.

I also think that we should show merge the overlapping terms and show both HPOs. All is clear then, I'll open and address an appropriate issue in HpoTextMining.