Closed tmog closed 10 years ago
Would be nice to have a test for the punctuation handling.
Also, it seems like we should put something for unrecognized characters. Otherwise normalization of titles that are made entirely of unrecognized characters (CJK languages, for example) will always produce an empty normalization.
@davisagli You are right. Maybe the existing solution is better.
@bosim as far as i understand this means we can close this one, so i do. I I'am wrong please reopen.
...lizer (things like em dash). Also do not add hex value for the chars we do not handle. Em dash is a good example why this is bad (especially this year) - it has a hex value of 2013.;-)