Open wkoszek opened 8 years ago
Same as your other issue, that is related to the splitting regex:
We could try to include the dash in the list of allowed word characters. Maybe only need to do some preprocesssing to remove it at the beginning and end, so fragments like "-word" or "word-" (happens a lot e.g. in German language) are processed correctly.
Seems like we're splitting words such as
green-yellow
orT-rex
. I wonder if there's a good solution to this.