mquinson / po4a

Maintain the translations of your documentation with ease (PO for anything)
http://po4a.org/
GNU General Public License v2.0
121 stars 58 forks source link

Why does neverwrap add newlines to plain text entries? #359

Open ghostwords opened 2 years ago

ghostwords commented 2 years ago

Hello! I'm using po4a to facilitate translating source Markdown files in Weblate. After a bunch of trial and error, I seem to have discovered the combination of flags to disable all wrapping of messages.

However, setting the neverwrap flag seems to have introduced a curious side effect where all "Plain text" type entries got updated to include a newline character (\n) at the end of the string. (YAML Front Matter types weren't affected.) Besides having to redo translations for these "new" messages in Weblate, the extra newlines mean that unordered lists in translations are now split up into separate paragraphs, producing undesirable white space. Compare this entry to its Chinese version. [ UPDATE: I applied a workaround ]

I suppose I can work around the lists issue with another script to remove the extra newlines from any lists in the generated Markdown files. Why does "neverwrap" add newlines though? This was unexpected; all I wanted was to disable content wrapping. Thanks for any assistance!

PeterNjeim commented 2 years ago

Yes I had the same issue. I believe adding the nobullets option in combination with neverwrap solves it.

PeterNjeim commented 2 years ago

Well, it solves the new lines in lists issue, but there remains the new line character at the end of paragraphs still, this is not problem for me since I just started my project and so no redoing of translations in Weblate is needed. I've checked and the source and output for English are the exact same, so no unintended side effects. I guess for you you'll have to add a newline to most translations in Weblate unfortunately.

ghostwords commented 2 years ago

Thank you! Looks like nobullets removed the unwanted line breaks from lists. Although lists are no longer specially typed and translators are now responsible for correctly preserving the list markers, but I think that's OK.

ghostwords commented 2 years ago

My immediate issue has been worked around, but there might still be a problem here with neverwrap producing unexpected linebreaks (that break existing translations and make all list entries into paragraphs).

PeterNjeim commented 2 years ago

I agree, I've also never understood why there was even wrapping to begin with. Text shouldn't be tampered with, I'll let my text editor or web interface wrap the lines for me, wrapping should never be applied to a file itself. Wrapping should be disabled by default (actually it should be removed completely) from this progam, and also a rewrite from Perl to something else (Rust in my opinion). I had trouble building this program since Ubuntu 20.04 came with a version from 2018. I had to copy and paste from the logs in one of the Github Actions runs on this repo to build it since the bundled build instructions were too simple and didn't actually help too much.

What I'm saying is this repo should pretty much get a full rewrite, as well as proper documentation instead of reading the man pages, which are fine but don't let you know about gotchas like wrapping and their unintended side effects. I'd personally be more than willing to help with this effort.

ghostwords commented 2 years ago

I installed by checking out this repo and then adding a shortcut script to ~/.local/bin/po4a to make "po4a" correctly run po4a from the git checkout:

#!/usr/bin/env bash
PERLLIB=/PATH/TO/YOUR/PO4A/CHECKOUT/lib /PATH/TO/YOUR/PO4A/CHECKOUT/po4a "$@"

(Replace the two instances of /PATH/TO/YOUR/PO4A/CHECKOUT/ with where you checked out this repo.)

PeterNjeim commented 2 years ago

I barely even understood what that section of the readme meant, another reason the docs should be "enhanced" for noobs. Thanks for letting me know about how to use it properly

mquinson commented 2 years ago

I agree, I've also never understood why there was even wrapping to begin with. Text shouldn't be tampered with, I'll let my text editor or web interface wrap the lines for me, wrapping should never be applied to a file itself. Wrapping should be disabled by default (actually it should be removed completely) from this progam,

This is a valid opinion. Until you try to read a diff generated by git :) I think I'll make nowrap the default to avoid this trap to the next guys.

and also a rewrite from Perl to something else (Rust in my opinion). I had trouble building this program since Ubuntu 20.04 came with a version from 2018.

Good idea, please be my guest. If you rewrite it to Python, there is even a $250 price to win. But nobody produced anything else than rants on this topic since almost 3 years, so po4a is still around :shrug:

What I'm saying is this repo should pretty much get a full rewrite, as well as proper documentation instead of reading the man pages, which are fine but don't let you know about gotchas like wrapping and their unintended side effects. I'd personally be more than willing to help with this effort.

Patch are always welcome of course. Documentation patches should be easy.

I wrote this software more than 20 years ago, passed over the maintenance for over a decade, hopped for years that it will get superseeded by something else better (which shouldn't be hard given that it's no more than a bunch of perl scripts that I wrote to avoid writing my thesis almost during the previous century), but still. I never saw any replacement of po4a out there. What a pity. I'd love to assist you guys rewriting this soft in any other language, I'd feel really relieved.

ghostwords commented 2 years ago

Hi @mquinson, just to be clear, I'm grateful po4a is available, is still useful and that we're all here helping each other in figuring out how to use it. Thank you!

mquinson commented 2 years ago

I barely even understood what that section of the readme meant, another reason the docs should be "enhanced" for noobs. Thanks for letting me know about how to use it properly

Remember: patch, not rants, please. Maybe you can find some material for your new section in the doc here: https://github.com/mquinson/po4a/blob/master/CONTRIBUTING.md#testing-your-changes Thanks in advance.

PeterNjeim commented 2 years ago

I agree, I've also never understood why there was even wrapping to begin with. Text shouldn't be tampered with, I'll let my text editor or web interface wrap the lines for me, wrapping should never be applied to a file itself. Wrapping should be disabled by default (actually it should be removed completely) from this progam,

This is a valid opinion. Until you try to read a diff generated by git :)

Specifically on this point, since I use Visual Studio Code, wrapping is done very well for git diffs, you can see it in action here: https://github.com/mquinson/po4a/issues/358#issuecomment-1081024768. It adds blank, diagonally striped lines to match the vertical height of the other file. In the comment I linked, I had wrapping disabled, however I can also enable wrapping within VS Code and it would still produce a similar effect.

Also, you're right that I was ranting, and I apologize for being so negative. I can get carried away sometimes

ghostwords commented 2 years ago

Perhaps wrapping is an artifact of helping people make sense of changes years ago when we didn't have within-the-line change highlighting in GitHub or Visual Studio or git diff --color-words. The tradeoff is different these days.

PeterNjeim commented 2 years ago

Indeed 20 years is a long, long time. I remember Dolphin Emulator (18 years old) having to completely rewrite its UI in QT due to using a limited library (wxWidgets). It took more than 4 years to reach feature parity. They also had to rewrite a huge portion of their graphics emulation logic: https://dolphin-emu.org/blog/2019/04/01/the-new-era-of-video-backends. And that's what I'd consider a "popular" project, with lots of, I guess you could say, motivation, to improve it.

mquinson commented 2 years ago

As for the neverwrap parameter, it seems to me that it's enabled by default already. Looking at the NEWS file, it seems so since v0.58 that was release for that I wrote during the first covid lockdown to restart the dev of the project since it was still used. Can you guys come up with a MWE where activating neverwrap actually changes anything? A simple test project on github or elsewhere that I could checkout and play with would constitute a perfect MWE.

As for the documentation, could you please speak tell me the paragraphs that need to be rewritten? Again during the 0.58 release cycle, I greatly simplified the documentation and I reduced its length to increase its usefulness. I'm glad to further improve it if you tell me how or at least what should be changed. Right now, I tend to think that it's not only the gibberish of my head, since some other people managed to translate it in half a dozen languages ;)

As for the rewrite, I'm slowly getting the impression that po4a is simply good enough and does its job, killing the motivation of the contenders. I would truly love to assist someone rewritting it to Rust as this is a language in which I could be interested. I was providing assistance to someone trying to rewrite po4a in Python for a better intergration with Weblate but this guy disappeared after a few weeks. That would not be the first legacy project that I rewrite in another language, and this is the reason why I restarted the dev in v0.58 by redoing all tests, to gain confidence in the tests and be able to change the code in confidence. If someone wants to rewrite po4a, then the code and me are ready for the journey. In the meanwhile, help to maintain the beast is always welcome.

jnavila commented 2 years ago

his is the reason why I restarted the dev in v0.58 by redoing all tests, to gain confidence in the tests and be able to change the code in confidence

That helps a lot for refactorings.

Looking at it with a better knowledge now, I can tell that it is slightly more than a bunch of Perl scripts.

mquinson commented 2 years ago

Thanks @jnavila :)

And you are right, I think that the reason why no serious contender appeared over the years is that nobody approaches the problem with modularity in mind. Pet projects are short lived because they target one community only and often neglect testing and software quality assurance. In po4a, the coding efforts mandated by manpage community, to give credits to translators through an addendum, can be reused by other communities at low cost. The efforts about wrapping/nowrapping that took originally place in asciidoc can be reinvested in markdown. The efforts about escaping complex inline formatting that took place in Groff/manpages can be reused in texinfo. So does the idea of placeholder that is currently only used in XML but would be generic enough to be reused in another language, or the idea of external filter and potin file that was invented for SGML and is now generic.

There is a lot to do to push the project further and not only regarding the documentation (protecting inline formatting or revalidating it after the work of the translator, simplifying the config for projects with many many files, allowing formats where all translations are placed in the same file such as Desktop or debconf, and many others), but this all must be a community effort. Free software must be a community effort (if it's not funded elsewhere and I'm not searching for such a funding myself).

@PeterNjeim, come up with this patch about the doc that you were begging for, and you'll get the proud of a useful contribution to a free software community ;)

PS: if you guys want chat about po4a, there is a IRC channel (#po4a on oftc) and a Discord channel (https://discord.gg/5kJYaj57Tc )

ghostwords commented 2 years ago

Can you guys come up with a MWE where activating neverwrap actually changes anything?

Sorry, this isn't minimal, but I hope it's clear enough. Let me know if it's not. This is meant to show that neverwrap adds "nowrap" comments and newlines to plain text entries in PO files, which then removes wrapping from generated Markdown files. Message wrapping in PO files was taken care of by other options, not neverwrap.

  1. git clone https://github.com/EFForg/privacybadger-website.git
  2. Edit po/po4a.conf removing opt:"--option neverwrap"
  3. Run po4a po/po4a.conf
  4. Observe the PO files in the po folder lost the newlines I opened this issue about, as well as the #, nowrap comments
  5. All the Markdown files just got deleted because the translations no longer match (no more newlines). Edit po/faqs.es.po to restore one of the translations so that we can look at the generated Markdown. Let's take the plain text entry for content/en/faqs/What-is-Privacy-Badger.md and remove the trailing "\n" from the msgstr. Also remove the fuzzy comment (#, fuzzy) and the fuzzy msgid (#| msgid "Privacy Badger is a browser extension ...).
  6. Rerun po4a po/po4a.conf
  7. You should now be able to run git diff content/es/faqs/What-is-Privacy-Badger.md. Observe the Markdown is now wrapped.

I'm using the v0.66 tag of po4a from a git checkout.

ghostwords commented 2 years ago

If it helps to improve Po4a docs, I added Po4a installation instructions to my project's README (look for "To install Po4a").

PeterNjeim commented 2 years ago

po4a is indeed good enough, and I'm using it to great effect. I think the issue I have is with gettext itself. For example, in my markdown I have some html, which po4a will include in a po file. Strings like </div> obviously shouldn't be translated, so I just leave them blank (I use the "read-only" flag in Weblate). Then when I try to generate the translated markdown files, some languages don't make the 80% mark since those untranslated divs aren't counted lol. So for now I just copy and paste those untranslatable strings for every language. Apparently the gettext spec doesn't have an option for untranslatable strings, at least from what I've seen from a Stack Overflow post.

It's not like a rewrite is going to actually improve the program itself, more like make it more likely that others will contribute in the future (for example I don't know Perl at all, to me it was just a regex language I kept hearing about whenever I looked up how to use sed in a complex way, also used cpan for the first time a few days ago).

mquinson commented 1 year ago

Hello,

there is several parts in this issue. About the documentation issues, I just rewrote the "QUICK START" section at the end of the po4a(1) manpage. https://github.com/mquinson/po4a/blob/master/po4a#L602 Please tell me if it would have helped you when you discovered po4a, and how I could improve this any further.

Thanks for your interest, Mt