rcrowley / mustache.sh

Mustache in POSIX shell
Other
81 stars 8 forks source link

Another Encoding problem #3

Open malk opened 11 years ago

malk commented 11 years ago

per my last patch mustache.sh should be Encoding agnostic, alas:

the newline-detection trickery done with sed do fail when there is a non-unicode accent just before the newline:

supposing, for example, iso-8859-1

echo 'A=é
b=c' | mustache

gives us

A=éb=c
malk commented 11 years ago

the bug does not occur in unicode.

malk commented 11 years ago

The bug comes from the

sed -r "
        s/./&\\n/g
        s/\\\\/\\\\\\\\/g
    "

as illustrated by doing (with iso-8859-1)

echo 'A=é
b=c' | sed -r "
        s/./&\\n/g
        s/\\\\/\\\\\\\\/g
    "
rcrowley commented 11 years ago

I poked around a bit with iconv(1) (character sets are mysterious beasts and my shell loves UTF-8) and I think this confirms the issue. Note the extra 0a in the UTF-8 version:

$ printf 'A=é\nB=c\n' | iconv -f UTF-8 -t UTF-8 | sed -r "
> s/./&\\n/g
> s/\\\\/\\\\\\\\/g
> " | hd
00000000  41 0a 3d 0a c3 a9 0a 0a  42 0a 3d 0a 63 0a 0a     |A.=.....B.=.c..|
0000000f
$ printf 'A=é\nB=c\n' | iconv -f UTF-8 -t ISO_8859-1 | sed -r "
s/./&\\n/g
s/\\\\/\\\\\\\\/g
" | hd
00000000  41 0a 3d 0a e9 0a 42 0a  3d 0a 63 0a 0a           |A.=...B.=.c..|
0000000d
$

I am not sure how to fix this, unfortunately.

malk commented 11 years ago

ostensibly it is a sed bug. To do a bug report to sed I'd have to join their mailing list (not happening)

My current workaround is to replace the first sed usage by perl

perl -pe 's/([^\n])/\1\n/sg' | sed -r "s/\\\\/\\\\\\\\/g" | _mustache

but that change makes my mustache.sh not really .sh anymore

horrible, but works :/ I'll just live with that