mquinson / po4a

Maintain the translations of your documentation with ease (PO for anything)
http://po4a.org/
GNU General Public License v2.0
120 stars 58 forks source link

Problem after PerlIO overhaul in v0.70 #505

Closed eevan78 closed 3 weeks ago

eevan78 commented 3 weeks ago

Hello, I've been using po4a successfully, but after changes introduced in v0.70 the translation is not getting generated anymore. I suspect that some encoding is not detected properly. I have to point out that I'm using the po4a on Windows, with just one change: in order to work, I had to comment out the line 252 in Common.pm. So, it looks like

BEGIN {
    if ( eval { require Locale::gettext } ) {
        import Locale::gettext;
        require POSIX;
    # POSIX::setlocale( &POSIX::LC_MESSAGES, '' );
    } else {

I use a small script that I call with the name of the file I want to generate the translation for. It calls the po4a-translate like this:

perl D:\Projects\po4a\po4a-translate -f text -o nobullets -o neverwrap -k 0 -M utf-8 -m d:/Projects/doc-prevod/original/TXT/%1.txt -p d:/Projects/doc-prevod/prevod/PO/%1.txt.po -l d:/Projects/doc-prevod/prevod/SRX/%1.srx

The process in v0.72 fails with:

po4a-translate is deprecated. The unified po4a(1) program is more convenient and less error prone. Once configured, `po4a --no-update` can be used as a drop-in replacement to `po4a-translate`.
Malformed encoding while writing to file d:/Projects/doc-prevod/prevod/SRX/builtin.srx with charset UTF-8: "\x{fffd}" does not map to UTF-8 at d:\Projects\po4a\lib/Locale/Po4a/TransTractor.pm
line 533.
Close with partial character at d:\Projects\po4a\lib/Locale/Po4a/TransTractor.pm line 551.

Generated file breaks at the second character in line 21:

*builtin.txt*   За Vim верзију 9.1.  Последња измена: 06 јун 2024

        VIM РЕФЕРЕНТНО УПУТСТВО    написао Брам Моленар

Уграђене функције           *builtin-functions* *уграђене-функције*

Напомена: израчунавање израза може да се искључи приликом компајлирања програма
и тада нису доступне уграђене функције. Погледајте |+eval| и
|без-eval-могућности|.

Ако желите функције груписане по ономе за шта се користе, погледајте
|листа-функција|

1. Преглед              |уграђене-функције-листа|
2. Детаљи               |уграђене-функције-детаљи|
3. Листа могућности         |листа-могућности|
4. Подударање шаблона у Стрингу     |стринг-подударање|

=

If I try with po4a (not po4a-translate...):

perl D:\Projects\po4a\po4a --no-update --translate-only d:/Projects/doc-prevod/original/TXT/%1.txt D:/Projects/doc-prevod/po4a.cfg

I get the following with v0.72:

Split mode, creating a temporary POT:Disabling --translate-only option, it is not supported in split mode

"\x{fffd}" does not map to UTF-8 at d:\Projects\po4a\lib/Locale/Po4a/Po.pm line 616.
Close with partial character at d:\Projects\po4a\lib/Locale/Po4a/Po.pm line 616.

Now I don't touch any of the files and run the first command using v0.69 and everything works fine.

Here is the beginning of the generated translated file:

*builtin.txt*   За Vim верзију 9.1.  Последња измена: 06 јун 2024

        VIM РЕФЕРЕНТНО УПУТСТВО    написао Брам Моленар

Уграђене функције           *builtin-functions* *уграђене-функције*

Напомена: израчунавање израза може да се искључи приликом компајлирања програма
и тада нису доступне уграђене функције. Погледајте |+eval| и
|без-eval-могућности|.

Ако желите функције груписане по ономе за шта се користе, погледајте
|листа-функција|

1. Преглед              |уграђене-функције-листа|
2. Детаљи               |уграђене-функције-детаљи|
3. Листа могућности         |листа-могућности|
4. Подударање шаблона у Стрингу     |стринг-подударање|

================================================================================
1. Преглед             *builtin-function-list* *уграђене-функције-листа*

Употребите CTRL-] на имену функције да скочите на пуно објашњење.

УПОТРЕБА            РЕЗУЛТАТ    ОПИС    ~

abs({изр})          Покретни или Број  Апсолутна вредност {изр}
acos({изр})         Покр    аркус косинус {изр}
add({објекат}, {изр})     Листа/Блоб    додаје {изр} у {објекат}
and({изр}, {изр})       Број    побитно И
append({брлин}, {текст})    Број    додаје {текст} испод линије {брлин}
appendbufline({изр}, {брлин}, {текст})
                Број    додаје {текст} испод линије {брлин}
                    у бафер {изр}

I'm not sure what more info you would need to find out where is the problem, but just let me know, and I will provide it.

mquinson commented 3 weeks ago

Hello @eevan78, sorry for the inconvenience.

I guess that the files are not encoded in UTF-8. There is two easy fixes. The first one is to recode the files to UTF-8, but the most convenient approach is probably to use the --master-charset flag to specify the encoding to use.

Please tell me whether it helps.

As for the windows fix related to POSIX, does it help to write unless $^O eq 'MSWin32' on the line you usually comment? If so, I'll include this to the release so that you don't have to modify your version locally.

mquinson commented 3 weeks ago

I just commited 0717616bce2a281a3b7a2348f0d1d9abc9f1b892 to improve the error message and hint about the idea of specifying the charset on the command line. HTH

eevan78 commented 3 weeks ago

I have experimented with the charset switches. Unfortunately, it still doesn't work. I'm not sure where is the problem, because all my files are UTF-8 encoded. And they are not causing any problems with v0.69. Without changing anything. My config file has the following directive:

[po4a_alias:vimtxt] text opt:"-M utf-8 -k 0 -L utf-8 -o neverwrap -o nobullets"

When I call po4a with perl D:\Projects\po4a\po4a --no-update --master-charset utf-8 --localized-charset utf-8 D:/Projects/doc-prevod/po4a.cfg, I get:

Split mode, creating a temporary POT: (43890 entries)
Malformed encoding while writing char '?' to file d:/Projects/doc-prevod/prevod/SRX/help.srx with charset utf-8:
"\x{fffd}" does not map to UTF-8 at d:\Projects\po4a\lib/Locale/Po4a/TransTractor.pm line 542.
If utf-8 is not the expected charset, you need to configure the right one with with --localized-charset or other similar flags.
Close with partial character at d:\Projects\po4a\lib/Locale/Po4a/TransTractor.pm line 556.

help.txt is the first file stated in po4a.cfg.

I'm getting the following message with po4a-translate when called with perl D:\Projects\po4a\po4a-translate -f text -o nobullets -o neverwrap -k 0 -M utf-8 -m d:/Projects/doc-prevod/original/TXT/%1.txt -p d:/Projects/doc-prevod/prevod/PO/%1.txt.po -L utf-8 -l d:/Projects/doc-prevod/prevod/SRX/%1.srx:

po4a-translate is deprecated. The unified po4a(1) program is more convenient and less error prone. Once configured, `po4a --no-update` can be used as a drop-in replacement to `po4a-translate`.
Malformed encoding while writing char '?' to file d:/Projects/doc-prevod/prevod/SRX/builtin.srx with charset utf-8:
"\x{fffd}" does not map to UTF-8 at d:\Projects\po4a\lib/Locale/Po4a/TransTractor.pm line 542.
If utf-8 is not the expected charset, you need to configure the right one with with --localized-charset or other similar flags.
Close with partial character at d:\Projects\po4a\lib/Locale/Po4a/TransTractor.pm line 556.

POSIX problem is solved with unless $^O eq 'MSWin32', no need to comment out the line after adding it. Thanks for the guidance!


UPDATE: I have tried the previous in WSL (Ubuntu 22.04.4), and it worked. So this seems to be a Windows related issue...

eevan@LPBGM0320:~/perl5/perlbrew$ PERLLIB=~/git-checkouts/po4a/lib ~/git-checkouts/po4a/po4a --no-update --master-charset utf-8 --localized-charset utf-8 ~/Projects/doc-prevod/po4a.cfg
Split режим, креира се привремени POT фајл: (43890 ставки)
/home/eevan/Projects/doc-prevod/prevod/SRX/help.srx је преведен 100% (34 стрингова).
/home/eevan/Projects/doc-prevod/prevod/SRX/usr_toc.srx је преведен 100% (63 стрингова).
/home/eevan/Projects/doc-prevod/prevod/SRX/arabic.srx је преведен 100% (75 стрингова).
/home/eevan/Projects/doc-prevod/prevod/SRX/autocmd.srx је преведен 100% (226 стрингова).
/home/eevan/Projects/doc-prevod/prevod/SRX/builtin.srx је преведено 73.83% (1154 од 1563 стрингова).
/home/eevan/Projects/doc-prevod/prevod/SRX/change.srx је преведено 99.38% (323 од 325 стрингова).
/home/eevan/Projects/doc-prevod/prevod/SRX/channel.srx је преведено 92.11% (292 од 317 стрингова).
/home/eevan/Projects/doc-prevod/prevod/SRX/cmdline.srx је преведен 100% (168 стрингова).
⁝
eevan@LPBGM0320:~/perl5/perlbrew$

It could be something with the locale detection and setting. As I'm not able to set it. You can see that it is automatically showing localized messages in Ubuntu, but I could not make it work in Windows. Maybe that is why it is also messes up the charsets...

mquinson commented 3 weeks ago

That's really weird. Some error messages come from Po.pm:616, which is where we write the msgstr in the PO file. So, how does your PO files look like?

Also, which version of po4a are you using? I changed some stuff this night and we need to be crystal clear about the used version, please.

mquinson commented 3 weeks ago

Note that the error message is about the char FFFD failing to be written. According to https://www.fileformat.info/info/unicode/char/fffd/index.htm this could indicate an error while reading the file (so we come back to the encoding of the input files).

If you manage to confirm that 0.72 works on a previous WSL, then I think we can close this bug. If not, it will give us more info I hope.

eevan78 commented 3 weeks ago

Sorry, I didn't mention it explicitly, I thought that you would see that from the error messages.

In the last comment, I used use the current head of the repository, including your commit from last night.

Here is the PO file. It is quite big, so I zipped it. It was created with older version of po4a. builtin.txt.po.zip

eevan78 commented 3 weeks ago

If you manage to confirm that 0.72 works on a previous WSL, then I think we can close this bug. If not, it will give us more info I hope.

Yes, I confirm that v0.72 (and the current HEAD) work fine with these files in WSL.

Char Unicode replacement character FFFD is usually used when there is no representation available (perhaps due to invalid utf-8 sequence) so that file can be read after that error...

mquinson commented 3 weeks ago

Ok, then. I'm closing this issue that does not seem to be caused by po4a after all. Feel free to reopen on need.

Thanks for your help debugging it.

eevan78 commented 3 weeks ago

The problem is related to the Strawberry Perl, nothing to do with the po4a code. I should have investigated more before opening the issue. Unfortunately, It is still not solved, there seems to be issues with UTF-8: https://github.com/StrawberryPerl/Perl-Dist-Strawberry/issues/150