mquinson / po4a

Maintain the translations of your documentation with ease (PO for anything)
http://po4a.org/
GNU General Public License v2.0
120 stars 58 forks source link

UTF-8 "\xF3" does not map to Unicode at /usr/share/perl5/vendor_perl/Locale/Po4a/TransTractor.pm line 583 #477

Closed sergiomb2 closed 4 months ago

sergiomb2 commented 4 months ago

debhelper-13.11.6-3.fc41 FTBFS: UTF-8 "\xF3" does not map to Unicode at /usr/share/perl5/vendor_perl/Locale/Po4a/TransTractor.pm line 583

This failure is probably triggered by upgrading po4a from 0.69-5.fc40 to 0.70-2.fc41

https://bugzilla.redhat.com/show_bug.cgi?id=2266008

you may check the builds on https://koschei.fedoraproject.org/package/debhelper , particularly https://koschei.fedoraproject.org/build/17470881

mquinson commented 4 months ago

I don't see this char in the source code. Help is welcome, please.

Fat-Zer commented 4 months ago

I don't see this char in the source code. Help is welcome, please.

it's in spanish addendums, which are in ISO-8859-15 charset. It's ó in Traducción.

It seems they correctly specify the addendum charset in po4a for Spanish, but po4a tries to read it in UTF-8 anyway... so it's probably a po4a's bug.

Besides that po4a probably should print some less cryptic error message with a reference to the file at least and maybe continue with the rest of the documents instead of straight up crashing...

To reproduce:

git clone https://salsa.debian.org/debian/debhelper.git
cd debhelper
po4a man/po4a/po4a.cfg

PS: po4a-translate works fine:

po4a-translate -f pod -m dh_auto_build -l man/es/dh_auto_build.pod -p man/po4a/po/es.po -A ISO-8859-15 -a man/po4a/add3.es
mquinson commented 4 months ago

Ok, I think I got it. I must have forgotten to use PerlIO on the filehandle that is in charge of the addendum. I'll look at it later on.

Many many thanks @Fat-Zer for this great diagnostic. You really rock.

mquinson commented 4 months ago

Ok, I nailed this down, I guess. The main issue was the po4a.cfg file which is not nice with us.

[po4a_alias:pod] pod opt_fr:"-L ISO-8859-15 -A UTF-8"
[po4a_alias:pod] pod opt_es:"-L UTF-8 -A ISO-8859-15"
[po4a_alias:pod] pod opt_de:"-L ISO-8859-15 -A UTF-8"
[po4a_alias:pod] pod opt_pt:"-L UTF-8 -A UTF-8"
[po4a_alias:pod] pod opt_ja:"-L UTF-8 -A UTF-8"

See how it's redefining the pod alias several times. I think that this was already broken in the past (only the last definition is kept) but before the introduction of PerlIO, the -A ISO-8859-15 was useless for some reason involving Perl trying to magically guess what to do.

I fixed this in commit 300208e197ce4e8a7692ed835f093abb8c07007b, but debhelper still fails with this error:

Malformed encoding while writing char '–' to file /tmp/debhelper/man/de/debhelper.pod with charset ISO-8859-15: "\x{2013}" does not map to iso-8859-15 at 

This is because of the -L iso-8859-15 in the config file, which shouldn't be there since most languages of the list use UTF-8 chars that cannot be represented in Latin1. It used to work because all those -L parameters where wrongly discarded because of the previous bug. I guess that if you move the opt_fr line after the opt_ja one, it will fail as it is right now, because in that case, the request to write french files in Latin1 is not ignored anymore, and it fails because the french translation contains , the char representing 3 dots, that cannot be represented in Latin1.

I think that my code fixes this bug, but I leave it open for further discussion in case someone has a better idea.

mquinson commented 4 months ago

Ok, nobody spoke out, so I'm closing the bug. Feel free to reopen on need.