larsga / whazzup

Automatically exported from code.google.com/p/whazzup
0 stars 0 forks source link

Fix up tokenizer #15

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
The tokenizer will sometimes create tokens like "berlusconi'", which isn't 
right. Need to also look for other bad tokenizations.

Original issue reported on code.google.com by lar...@gmail.com on 22 Feb 2011 at 8:47

GoogleCodeExporter commented 8 years ago
"Ukraine's" becomes "ukraine", but "Ukraine" becomes "ukrain".

Original comment by lar...@gmail.com on 25 Feb 2011 at 1:24