matkoniecz / quick-beautiful

Programming exercises - tutorial material for work with a beginner. This exercises should be simple and give a properly impressive results.
GNU General Public License v3.0
1 stars 2 forks source link

document trailing punctuation issues #35

Open matkoniecz opened 5 years ago

matkoniecz commented 5 years ago

https://stackoverflow.com/a/17951315/4130619

Nice, but some English words truly contain trailing punctuation. For example, the trailing dots in e.g. and Mrs., and the trailing apostrophe in the possessive frogs' (as in frogs' legs) are part of the word, but will be stripped by this algorithm. Handling abbreviations correctly can be roughly achieved by detecting dot-separated initialisms plus using a dictionary of special cases (like Mr., Mrs.). Distinguishing possessive apostrophes from single quotes is dramatically harder, since it requires parsing the grammar of the sentence in which the word is contained.