Open fhanzlik opened 4 months ago
Sorry, indeed flat
does not work as before. I'm not sure quite what the answer is; it seems to be complicated.
However, I can offer a workaround in the mean time:
echo "růžička"|recode -f u8..iso-8859-1-translit,iso-8859-1..flat
ruzicka
The second step iso-8859-1..ascii-bs
is needed because accented characters that can be represented in ISO-8859-1 will still be present after the first step. So:
echo "érůžička"|recode -f u8..iso-8859-1-translit
�rruzicka
whereas
echo "érůžička"|recode -f u8..iso-8859-1-translit,iso-8859-1..flat
eruzicka
I think the solution to this bug is to make a converter from UTF-8 to ASCII-BS (rather than from Latin-1 to ASCII-BS as at present). This would avoid the need for the -translit
step, without adding extra magic. (In Recode 3.6, transliteration is always tried if non-transliterated conversion fails. This means that Recode's behaviour can change according to its input.)
Hi Thomas, thank for your interest in this issue, and yes - your solution work well!
There is a much easier workaround: use ASCII-translit
instead of flat
:
echo "érůžička"|recode -f u8..ascii-translit
eruzicka
Thomas thanks - I'm now using this conversion format.
I'll keep this open as a placeholder, because something needs to happen with flat; I'm just not sure what yet.
What was working in oldier Pinard recode versions is not working now: ` echo "růžička"|recode -f u8..flat
rika ` (instead of right result "ruzicka")