monarch-initiative / fenominal

Phenomenal text mining for disease and phenotype concepts
https://monarch-initiative.github.io/fenominal/
GNU General Public License v3.0
9 stars 1 forks source link

do not match across semicolon #40

Open pnrobinson opened 2 years ago

pnrobinson commented 2 years ago

Currently, hypoplastic ear lobes with cup-shaped right ear; anteverted nares matches anteverted ear

pnrobinson commented 1 year ago

Fixed this in FmCoreDocument by replacing

 private final static Set<Character> sentenceEndPunctuation = Set.of('.', '!', '?');

with this

    private final static Set<Character> breakingPunctuation = Set.of(';', ':', '.', '!', '?');

This has the side effect turning phrases like this

"hypoplastic ear lobes with cup-shaped right ear; anteverted nares";

into two SimpleSentences, but I do not think this is a problem for downstream software. @tudorgroza I could not find any better way of doing this, but maybe I am missing something obvious?