when the incoming data is in eg utf-8, and outgoing data have to be in eg ascii
or iso-8859-1 (UNOC!) this can be problematic. Edi mostly contains codes and
numeric data, but addresses and text can contain 'data as given in by user'.
Added 2 functions in transform.py for this: dropdiacritics2ascii and
dropdiacritics2latin.
input: unicode, output: unicode. Output is suited for ascii or latin1.
Diacritics are converted, eg é -> e
works for most cases.
Notes:
- 'other' chars are dropped: eg all of ðæÆÐØßø
- Dutch ij (one char!)-> ij (2 chars). Did not see this with other characters,
eg German ü->u
- for dropdiacritics2latin: ö->ö (is in latin1/iso-8859-1)
- dropdiacritics2latin works for all latin/iso-8859 variants
note that there unicode/utf-8 contains a lot of characters, not all have been
tested.
Original issue reported on code.google.com by hjebb...@gmail.com on 16 Apr 2015 at 9:30
Original issue reported on code.google.com by
hjebb...@gmail.com
on 16 Apr 2015 at 9:30