thienbui / language-detection

Automatically exported from code.google.com/p/language-detection
1 stars 0 forks source link

Expand IJ and ij ligatures #45

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Please expand in 
https://code.google.com/p/language-detection/source/browse/src/com/cybozu/labs/l
angdetect/util/NGram.java in method normalize() the 'IJ' (U+0132) and 'ij' 
(U+0133) ligatures to 'I'+'J' and 'i'+'j'.

These are used sometimes in Dutch but the convention is to use IJ and ij. 
Reasons for this is that the ligature is hard to enter on a keyboard and many 
fonts render them visually identical as the IJ and ij ligatures.

Original issue reported on code.google.com by pander.m...@gmail.com on 17 Nov 2012 at 8:39