msys2 / MSYS2-packages

Package scripts for MSYS2.
https://packages.msys2.org
BSD 3-Clause "New" or "Revised" License
1.29k stars 487 forks source link

The ``msgcat`` binary is broken #3581

Open egeakman opened 1 year ago

egeakman commented 1 year ago

Description / Steps to reproduce the issue

msgcat -o file.po file.po command produces syntax errors in .po files.

Turns

#: library/stdtypes.rst:419 library/stdtypes.rst:1171
#: library/stdtypes.rst:2397 library/stdtypes.rst:3615
msgid "\\(4)"
msgstr "\\(4)"

to

#: library/stdtypes.rst:419 library/stdtypes.rst:1171
#: library/stdtypes.rst:2397
 library/stdtypes.rst:3615
msgid "\\(4)"
msgstr "\\(4)"

which is syntactically wrong. However, on some files, it does keep it the correct way. The above example is from python/python-docs-tr (library/stdtypes.po).

It also adds whitespace at the top of the first msgid-msgstr pair.

# Python Documentation Turkish Translation
# Copyright (C) 2001-2023, Python Software Foundation
# This file is distributed under the same license as the Python package.
-#
+# 
msgid ""
msgstr ""

Using other binaries, however, doesn't produce any of these errors.

Expected behavior

Wraping the .po file without messing it up.

Actual behavior

Messing up the file causing syntax errors.

Verification

Windows Version

MSYS_NT-10.0-22621

Are you willing to submit a PR?

I don't think I can

Biswa96 commented 1 year ago

I can not reproduce the issue. Here are my output

$ cd python-docs-tr/library/
$ msgcat -o stdtypes.po stdtypes.po
$ git status .
On branch 3.11
Your branch is up to date with 'origin/3.11'.

nothing to commit, working tree clean
Biswa96 commented 1 year ago

Opps, The issue can be reproduced with msys msgcat but not with mingw msgcat.

egeakman commented 1 year ago

Opps, The issue can be reproduced with msys msgcat but not with mingw msgcat.

I can reproduce the issue also on Git Bash which uses MINGW64_NT-10.0-22621. Also can reproduce on: MINGW64_NT-10.0-22621 and MINGW32_NT-10.0-22621.

egeakman commented 1 year ago

@Biswa96 any news? Is there anything I can do?

Biswa96 commented 1 year ago

I can not reproduce the syntax error with msys2's gettext. The git-for-windows project uses different version. Would you like to check with msys2 installation?

egeakman commented 1 year ago

I can not reproduce the syntax error with msys2's gettext. The git-for-windows project uses different version. Would you like to check with msys2 installation?

I checked with MSYS2 MSYS still can reproduce.

image image

Do we have the same versions installed @Biswa96? I believe I have the latest.

Biswa96 commented 1 year ago

Mine is this

x@y MSYS ~
$ which msgcat
/usr/bin/msgcat

$ msgcat --version
msgcat (GNU gettext-tools) 0.21
Copyright (C) 2001-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Bruno Haible.
Biswa96 commented 1 year ago

In origin post, you have mentioned to use msgcat -o stdtypes.po stdtypes.po like command but it does not produce any .rst file.

egeakman commented 1 year ago

In origin post, you have mentioned to use msgcat -o stdtypes.po stdtypes.po like command but it does not produce any .rst file.

msgcat doesn't necessarily need to generate an rst file. It also can wrap translation files (.po files). https://github.com/python/python-docs-tr/pull/115/files here is an example.

egeakman commented 1 year ago

Mine is this

x@y MSYS ~
$ which msgcat
/usr/bin/msgcat

$ msgcat --version
msgcat (GNU gettext-tools) 0.21
Copyright (C) 2001-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Bruno Haible.

I have the same. Can't you reproduce the error with this version?

Biswa96 commented 1 year ago

Nope. Is it possible to reproduce the issue in GitHub Actions CI with setup-msys2 action?

egeakman commented 1 year ago

Nope. Is it possible to reproduce the issue in GitHub Actions CI with setup-msys2 action?

I will try and get back.

egeakman commented 1 year ago

The action produces syntax errors as well. https://github.com/python/python-docs-tr/pull/123/commits/4753bd00b150c14c9f612fd4b770a125c57af1fd#diff-1216391b7875503e82705b1e5c5a5fb807389433edd0074c771d36a5696ce6b2R774

Here is the workflow file: https://github.com/python/python-docs-tr/blob/line-length/.github/workflows/msys.yaml

Biswa96 commented 1 year ago

I still can not reproduce it, even with clean install 😭 I shall try to investigate it. Here is the stdtypes.diff.txt file after running that msgcat command in my system. It does not change any line starting with #:

Biswa96 commented 1 year ago

BTW, would you like to check if running dos2unix library/stdtypes.po after the msgcat command changes anything?

egeakman commented 1 year ago

BTW, would you like to check if running dos2unix library/stdtypes.po after the msgcat command changes anything?

Didn't work. Does it change the line ending?

Biswa96 commented 1 year ago

Does it change the line ending?

Correct. In your provided link, I see some ^M strings. Generally, those ^M strings can be seen when a file contains CRLF line endings. For example, here is a sample text file,

$ cat -v test.txt
1st line^M
2nd line^M
$ cat -v test.txt
1st line
2nd line
egeakman commented 1 year ago

Looking at it closer, I noticed that: msgcat removes ^M on the lines that don't start with #:. Here is the diff after running msgcat:

-#: library/stdtypes.rst:419 library/stdtypes.rst:1171 library/stdtypes.rst:2397^M
-#: library/stdtypes.rst:3615^M
-msgid "\\(4)"^M
-msgstr "\\(4)"^M
+#: library/stdtypes.rst:419 library/stdtypes.rst:1171
+#: library/stdtypes.rst:2397^M library/stdtypes.rst:3615^M
+msgid "\\(4)"
+msgstr "\\(4)"

And dos2unix removes the rest, including lines starting with #:. But it can't strip the ones in between.

egeakman commented 1 year ago

Running WSL msgcat on Windows .po files also messes up the files when they have ^M as the line ending. Running dos2unix before msgcat removes the problem. But I think it should be able to handle the line ending itself. Using the binary built from vslavik/gettext-tools-windows doesn't produce any syntax errors at all (without needing dos2unix).

Biswa96 commented 1 year ago

I am not sure but I kind of feel like the issue is expected. The msys2 CI is run in windows-latest and tests are run in ubuntu-latest. To force LF endings, you can use .gitattributes file or use git config --global core.autocrlf false like commands.