Open pnrobinson opened 2 years ago
Fixed this in FmCoreDocument by replacing
private final static Set<Character> sentenceEndPunctuation = Set.of('.', '!', '?');
with this
private final static Set<Character> breakingPunctuation = Set.of(';', ':', '.', '!', '?');
This has the side effect turning phrases like this
"hypoplastic ear lobes with cup-shaped right ear; anteverted nares";
into two SimpleSentences, but I do not think this is a problem for downstream software. @tudorgroza I could not find any better way of doing this, but maybe I am missing something obvious?
Currently,
hypoplastic ear lobes with cup-shaped right ear; anteverted nares
matches anteverted ear