Closed sblondeel closed 3 years ago
Sorry, I can't help with 3.6. I hope Debian will upgrade soon!
To build the latest sources, try turning off parallel make flags. (At least, I have MAKEFLAGS
set to something like -j 4
by default, and recode fails to build; unsetting it fixes that.)
With recode 3.6 I get the same results as you.
With recode 3.7.8:
$ perl -e 'print "\x8f\n"' | recode windows-1252..html
ž
$ perl -e 'print "\x9e\n"' | recode windows-1252..html
recode: Ambiguous output in step `ISO-10646-UCS-2..HTML_4.0'
$ perl -e 'print "\x8f\n"' | recode windows-1252..utf-8
recode: Invalid input in step `CP1252..UTF-8'
$ perl -e 'print "\x9e\n"' | recode windows-1252..utf-8
ž
This is a bit odd, as it seems to do what you say is correct when the output is utf-8, but it also does what you say is wrong when the output is HTML.
So it seems that recode 3.7.8 does the right thing for UTF-8 output, but the wrong thing for HTML output. I shall investigate.
I believe that the correct translation in some cases in recode 3.7.8 is due to its default use of iconv (which has the correct encoding for CP1252, presumably). The built-in encoding seems to be wrong. I am loth to change anything in the tables without careful checks, so I'll double-check first!
Thanks for your reactivity. I understand your reluctance to tweak the tables. I am a bit surprised at the inconsistency depending on the backend; maybe there are others to detect?
Meanwhile I was trying to build recode bleeding edge with your tip, unsuccessfully so.
sblondeel@debian10:/tmp/recode$ MAKEFLAGS='' make
sblondeel@debian10:/tmp/recode$ make -j 1
both fail at the same step as previously. I don't see any MAKEFLAGS set in the Makefile or -j hanging around... This is, as hinted by the prompt, on a Debian10 (10.9). The fact it is running under Oracle Virtual Box should not matter?
Packages versions on Debian seem to be clogged up by the pending release of Debian 11. recode 3.7 is not in the pipeline yet:
https://packages.debian.org/search?keywords=recode&searchon=names&suite=all§ion=all
Regards,
Having checked the sources you mention, I agree there's an error in recode's CP1252 table, and I'll fix it.
For building, I suspect you're missing msgfmt
: look here:
rm -f be.gmo && : -c --statistics --verbose -o be.gmo be.po
The :
command should be msgfmt
.
3.7.9 released with fix.
Observed on recode 3.6 (Debian stable).
NB: I tried to reproduce this with bleeding edge recode but compilation fails at this step:
Wikipedia and various other online resources think U+017E is at byte 0x9e:
https://en.wikipedia.org/wiki/Windows-1252#Character_set
However, recode 3.6 thinks this character is at byte 0x8f and byte 0x9e is invalid:
This document
https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
found on the IANA website:
https://www.iana.org/assignments/charset-reg/windows-1252
has the following regarding those two bytes:
so I would be tempted to believe recode 3.6 is wrong on this.
Regards,