Please expand in
https://code.google.com/p/language-detection/source/browse/src/com/cybozu/labs/l
angdetect/util/NGram.java in method normalize() the 'IJ' (U+0132) and 'ij'
(U+0133) ligatures to 'I'+'J' and 'i'+'j'.
These are used sometimes in Dutch but the convention is to use IJ and ij.
Reasons for this is that the ligature is hard to enter on a keyboard and many
fonts render them visually identical as the IJ and ij ligatures.
Original issue reported on code.google.com by pander.m...@gmail.com on 17 Nov 2012 at 8:39
Original issue reported on code.google.com by
pander.m...@gmail.com
on 17 Nov 2012 at 8:39