monarch-initiative / HpoCaseAnnotator

Next-generation Biocuration App for annotating cases and PhenoPackets
https://hpocaseannotator.readthedocs.io/en/latest/index.html
BSD 3-Clause "New" or "Revised" License
8 stars 0 forks source link

String index out of range: -7 #62

Open pnrobinson opened 5 years ago

pnrobinson commented 5 years ago

With Hpo Case Annotator v1.0.13

I get

xception in thread "JavaFX Application Thread" java.lang.StringIndexOutOfBoundsException: String index out of range: -7
    at java.lang.String.substring(String.java:1967)
    at org.monarchinitiative.hpotextmining.gui.controller.Present.colorizeHTML4ciGraph(Present.java:226)
    at org.monarchinitiative.hpotextmining.gui.controller.Present.setResults(Present.java:441)
    at org.monarchinitiative.hpotextmining.gui.controller.HpoTextMining.lambda$new$0(HpoTextMining.java:88)
    at org.monarchinitiative.hpotextmining.gui.controller.Configure.lambda$analyzeButtonClicked$0(Configure.java:89)
    at com.sun.javafx.event.CompositeEventHandler.dispatchBubblingEvent(CompositeEventHandler.java:86)
(....)

This is the text:

Febrile seizures    HP:0002373HPOs: Dysarthria  HP:0001260HPOs: Loss of ability to walk HP:0006957HPOs: Myoglobinuria   HP:0002913HPOs: Focal-onset seizure HP:0007359HPOs: Apnea   HP:0002104HPOs: Elevated serum creatine kinase  HP:0003236HPOs: Hyperammonemia  HP:0001987HPOs: Hypoglycemia    HP:0001943HPOs: Myopathic facies    HP:0002058HPOs: Microcephaly    HP:0000252HPOs: Hyperactive deep tendon reflexes    HP:0006801HPOs: Babinski sign   HP:0003487HPOs: Exotropia   HP:0000577HPOs: Developmental regression    HP:0002376HPOs: Elevated serum creatine kinase  HP:0003236HPOs: Intellectual disability HP:0001249HPOs: Rhabdomyolysis  HP:0003201HPOs: Microcephaly    HP:0000252HPOs Free Text: Urine myoglobin of 94 ng/ml (normal range 10–65 ng/ml), serum creatine phosphokinase (CPK) of 205,000 U/l (normal range 75–230 U/l), elevated aspartate aminotransferase (AST) of 1,618 U/l (normal range 15–50 U/l), alanine aminotransferase (ALT) of 571 U/l (normal range 10–25 U/l), ammonia of 122 μmol/l (normal range 22–48 μmol/l), and hypoglycemia (blood glucose 30 mg/dl; normal range 70–110 mg/dl)
Not HPOs: Arrhythmia    HP:0011675Not HPOs Free Text: -
Variants: NM_152906.6:c.460G>A (p.Gly154Arg)
ClinVar ID: 208823
pnrobinson commented 5 years ago

This seems to happen because the end of this term is 304 but the start is 311, and thus we get the -7 string index error. Now we need to find where that comes from. image

pnrobinson commented 5 years ago

The minimum string that causes this error is "HPOs: Myopathic facies" The text miner gets back two hits from the API, Myopathic facies AND Myopathy. Myopathy is shorter than Myopathic facies and in the second time in the loop (Present.java) we have start=22 but end-15 (from myopathy) This line (226) causes the error:

query.substring(start, term.getEnd()),
pnrobinson commented 5 years ago

@ielis If I add the line with the ***** the error seems to be solved -- can you confirm that this is OK?

start = Math.min(start,term.getBegin());  ************
// THIS IS LINE 221 in Present.java in HpoTextMining
htmlBuilder.append(
   // highlighted text
    String.format(HIGHLIGHTED_TEMPLATE,
                            term.getTerm().getId().getValue(),
                            query.substring(start, term.getEnd()),
             // tooltip text -> HPO id & label
     String.format(TOOLTIP_TEMPLATE, term.getTerm().getId().getValue(), term.getTerm().getName())));
    offset = term.getEnd();
    }
ielis commented 5 years ago

@pnrobinson - looks good, please merge it into the develop branch. I'll write some tests to make sure this won't happen in future. Thanks!

pnrobinson commented 5 years ago

Thanks -- would you mind if I made a release of HpoCaseAnnotator with this update? I need this for a new comp without Java 8

pnrobinson commented 5 years ago

Actually, we can also add the --longestOnly option -- that would take care of the bug, probably. https://github.com/SciGraph/SciGraph/issues/272#issuecomment-513911261