Closed reckart closed 5 years ago
Dont we normalize whitespace somewhere?
Not at the level of the document text that is in the CAS. The CAS contains whatever whitespace the original document contained (and which passed through the DKPro Core Reader). We do a bit of normalization e.g. when sending text to brat.
Note, for other recommenders, this shouldn't be a problem because they operate on Tokens and Tokens normally don't include whitespace. But the StringMatchingRecommender operates directly on the CAS document text.
What happens if we would introduce the sentence level recommender in #590 ? Do we get the bad whitespace from the sentence?
Well - normally a recommender would operate on tokens, not on the base text. But for the StringMatchingRecommender, it is actually easier applying the Trie directly to the base text instead of first constructing a string from the tokens.
@Rentier for a sentence-level recommender cf. OpenNlpDoccatRecommender
.
Describe the bug If the StringMatchingRecommender learns that
John Smith
is a person, it won't predict thatJohn\tSmith
is a person as well.To Reproduce Steps to reproduce the behavior:
John Smith John\tSmith
and test.Expected behavior The kind of whitespace which separates tokens should not confuse the recommender.
Please complete the following information: