Closed marktaw closed 7 years ago
Update - I've also added functions to attach close quotes to the previous section, and a function to turn unicode quotes into straight quotes... obviously this can be generalized to cover unicode quotes, but I need quotes to be straight quotes & not unicode quotes anyway so....
Thanks for the fix. This greatly improves the accuracy.
I've added testcases (based on your examples, but anonymized) for this fix and also for the cleanupUnicode addition. Additionally, this fixed allowed me to reinstate some "incomplete" tests, so again; thanks!
I may take a look at cleanupUnicode in the future, as split()
currently returns the cleaned up version, not with original quotes, but split()
is really an afterthought for this project; count()
is the one that needs to be correct.
It's now available from Composer as v1.0.3
Thank you for this code.
I've changed the logic in abbreviationMerge so that capitalized abbreviations stay with the next, not previous fragment.
E.g.
Now splits neatly into
Where previously it split into
My revisions are in
https://github.com/vanderlee/php-sentence/compare/master...marktaw:master