mquinson / po4a

Maintain the translations of your documentation with ease (PO for anything)
http://po4a.org/
GNU General Public License v2.0
120 stars 58 forks source link

AsciiDoc: Sentences joined by a double space in a para - sentence per line #464

Closed git-pear closed 1 month ago

git-pear commented 5 months ago

Hello,

I have noticed that if an AsciiDoc text paragraph is styled as 'sentence per line', the resulting .po(t) file contains double spaces between the sentences of the paragraph.

If the paragraph in asciidoc is not styled as 'sentence per line' (all the para's sentences are on one line), the .po file is "normal", without double spaces.

Is such conversion of the 'sentence per line' paragraph of asciidoc' into 'double spaced' .pot/.po intentional, with a reason behind that?

Thank you very much for looking into this matter.

Josef Hruska

git-pear commented 5 months ago

This can be used as a sample asciidoc text:

= Test of a sentence per line para

== Para not styled as 'sentence per line'

Not sentence per line para. This para is written such as all it's sentences are placed on one line only. After po4a processing, there is just single space between the senteces of the para.

== Para styled as 'sentence per line'

Sentence per line para. This para is written such as all it's sentences are placed each one on its own line. After po4a processing, there are double spaces between the sentences of the para.

jnavila commented 4 months ago

What I found is that, this is mostly due to the fact that the carriage return is replaced with a space, and you can have a dangling space at the end of the line.

I can make po4a tidy the input string when it is not "wrap", but that logic will upset a lot of existing translations.

git-pear commented 4 months ago

I can make po4a tidy the input string when it is not "wrap", but that logic will upset a lot of existing translations.

Thank you for looking into this. I consider that not worth to make others upset.

jnavila commented 4 months ago

Maybe I can add an option, so that the default behavior is retained but you can still get a cleaned up po file. Let's try this.

git-pear commented 4 months ago

Then thank you very much indeed. I did not think about this 'configuration' possibility.

mquinson commented 4 months ago

@jnavila I think that an option for that would complicate the use of the software for little gain, unless we find a simplification opportunity such as "legacy mode" where we never do such fixes, and "modern mode" where we do all of them (ie, this one and the future comparable ones). But a specific option for this specific bug that we cannot fix without upsetting users seems like a bad idea.

Maybe, legacy needs to be a scalar related to the date instead of a boolean, so that people can chose the level of legacy they want in the future, to not force anyone to either embrace bugs older than their project by jumping in the legacy more or get rid of the bugs they are used to.

Still somewhat unsure here

git-pear commented 4 months ago

Still somewhat unsure here

Ok, this is not urgent IMHO, take your time to think it all through.

I was puzzled by the double spaces mainly because they are, let's say, highlighted in weblate editor I use for translating documentation. Not experienced translators may tend towards transferring the double spaces also to their translation(s), which is actually not necessary.

jnavila commented 4 months ago

I was puzzled by the double spaces mainly because they are, let's say, highlighted in weblate editor I use for translating documentation.

This also bogged me, and I feel that something needs to be done. Doing the translation of git manpages, I spotted some places where the authors use two spaces after a final dot. This is totally useless with asciidoc, because the processor deduplicates them anyway. And Weblate, which is unaware of asciidoc, tends to be very picky on maintaining double-spaces in translations. So this is obviously something I'd like to tackle.

Maybe, legacy needs to be a scalar related to the date instead of a boolean, so that people can chose the level of legacy they want in the future, to not force anyone to either embrace bugs older than their project by jumping in the legacy more or get rid of the bugs they are used to.

Of course, I don't want to upset already existing po-files; the default will be to keep all spaces. Each time I "fix" something in the management of po-files for asciidoc, unfortunately this comes with fuzzied entries in existing stuff if you apply them. The scalar optional stuff would be a good idea, but then, you have an indirection between an command line option and the actual stuff being fixed. I don't think that this is a good idea because this changes are not additive. For instance, 'tablecells' can make sense for some files but you may want to not enable it on others.

I don't think there's going to be a lot more options to develop (I hope!) .

mquinson commented 1 month ago

IIUC, this is fixed by https://github.com/mquinson/po4a/pull/481

Please do not hesitate to reopen if some issue remains.