Open graybeal opened 3 years ago
It seems to me that '6-10', '11-15', and '1-5' all pass as tokens >= 3 characters; '6' and '> 15' do not ('> 15' is 2 tokens separated by a space, and each token is 1 or 2 characters).
I can't speak to the details of the second set of strings. we'll see if anyone else in the team or on this list can speak to them.
Regarding the third case, what is likely happening is the stop words 'for' and 'with' break up the longer string. This is somewhat analogous to cases like '> 15' where the space means the tokens on either side are ignored.
I'm wondering whether the string patterns should include stop words and tokens shorter than 3 characters? Can this be indexed separately, i..e, all tokenized strings longer than 3 characters get indexed even if they contain stop words or spaces? Might have to be a different process if the stop words are processed as the first step.
In RCIT_A1 (private ontology), with screenshots omitted (sorry):
(1). For the classes ending with "L/min", 3 out of 6 are annotated out. Why half of them can be annotated, but the rest cannot? (2). For concepts like: "3 days ago", "3 days later", "three days ago" ( which is the synonym of "3 days ago") are not successfully recognized, but wrongly annotated by another class "3 days". Since they are 3-character concepts, they should meet with the indexing rule you mentioned.
AND
"Admitted with acute respiratory failure" and "admitted for heart failure" are not annotated.
see also #206