Open n8 opened 10 years ago
it's is a contraction - for tokenisation contractions are often considered two words (because they are really) - this is the case in Stanford Core - http://stackoverflow.com/questions/14058399/stanford-corenlp-split-words-ignoring-apostrophe
One option, as suggested in the above link, would be to handle imploding enclitics in the implode method - in treat this would be in module Treat::Entities::Entity::Stringable
so - looks like the issue is with the current implode method on string able - although it attempts to handle enclitics then from what i can see in the current implementation then 'value' would already be blank, so calling strip! would make no difference - when the imploded parts are merged the space is still there (as it is outside the scope of the strip!)
here's a fixed version - modified the recursive call to pass the value string and operations are all performed on the string instead of multiple copies - but a disclaimer is that i only started looking at treat about 3 hours ago!
https://github.com/chris-at-thewebfellas/treat/commit/d9b912f24d7673863ca3ea7e59016f022923ac66
for the same code, this now gives:
It's about time.
Results in:
Should that to_s without the extra space between It and `s?