Open joelnitta opened 1 year ago
I didn't new that fenced code blocks can also be written with columns (:). So far, they are possible with backquotes (`) and tildes (~). I think that the following patch solves your issue, but I'd like to have a full example to integrate to the test suite before I commit it to our git.
--- a/lib/Locale/Po4a/Text.pm
+++ b/lib/Locale/Po4a/Text.pm
@@ -714,7 +714,7 @@ sub parse_markdown {
$self->pushline( $line . "\n" );
$paragraph = "";
$end_of_paragraph = 1;
- } elsif ( $line =~ /^([ ]{0,3})(([~`])\3{2,})(\s*)([^`]*)\s*$/ ) {
+ } elsif ( $line =~ /^([ ]{0,3})(([~`:])\3{2,})(\s*)([^`]*)\s*$/ ) {
my $fence_space_before = $1;
my $fence = $2;
my $fencechar = $3;
@joelnitta it would be really good if you could propose an extension to https://raw.githubusercontent.com/mquinson/po4a/master/t/fmt/txt-markdown/PandocFencedCodeBlocks.md testing "your" variant of fenced blocks, please. Just tell me what text chunk should be added, and I'll integrate properly in our test suite.
Thanks @mquinson! I hadn't thought of this as a fenced code block, but rather as a markdown version of HTML divs (as described in the pandoc manual). But I suppose they are similar. The one thing that may differ is that pandoc fenced_divs
can be nested, and I don't know if that applies to code blocks. So po4a
would need to be able to account for that (again, my work-around was going to be to just not translate them, but if they were actually recognized and handled appropriately that would be even better).
I think borrowing from the pandoc manual should be fine for testing. Here are two examples.
First one is non-nested.
::::: {#special .sidebar}
Here is a paragraph.
And another.
:::::
Second one is nested.
::: Warning ::::::
This is a warning.
::: Danger
This is a warning within a warning.
:::
::::::::::::::::::
Ok, I think it's fixed now. The fact that it can be nested made the patch more complex than I thought. Thanks for reporting.
Thanks @mquinson for your help with this.
Sorry to make this request after you have already closed the issue, but I hope you might consider some other ways to handle this situation.
The problem with this approach IMHO is that if there is a large amount of content within a fenced div, it all shows up as a single msgid
. I think smaller msgid
s (generally one markdown paragraph at a time) are preferable. Also, this means that the translator may have to deal with more raw code (e.g., linebreaks (\n
)) that would otherwise not show up in the PO file.
For my project I plan to crowdsource the translation part (i.e. the localization), so I want translators to be exposed to a minimum amount of code.
This is an example of what happens using the current approach.
Original text:
::::::::::::::::::::::::::::::::::::: challenge
## Challenge 1: Can you do it?
What is the output of this command?
```r
paste("This", "new", "lesson", "looks", "good")
:::::::::::::::::::::::: solution
[1] "This new lesson looks good"
:::::::::::::::::::::::::::::::::
:::::::::::::::::::::::: solution
You can add a line with at least three colons and a solution
tag.
:::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::
PO file (header excluded):
msgid ""
"\n"
"## Challenge 1: Can you do it?\n"
"\n"
"What is the output of this command?\n"
"\n"
"r\n" "paste(\"This\", \"new\", \"lesson\", \"looks\", \"good\")\n" "
\n"
"\n"
":::::::::::::::::::::::: solution \n"
"\n"
"## Output\n"
" \n"
"output\n" "[1] \"This new lesson looks good\"\n" "
\n"
"\n"
":::::::::::::::::::::::::::::::::\n"
"\n"
"## Challenge 2: how do you nest solutions within challenge blocks?\n"
"\n"
":::::::::::::::::::::::: solution \n"
"\n"
"You can add a line with at least three colons and a solution
tag.\n"
"\n"
":::::::::::::::::::::::::::::::::\n"
"\n"
msgstr ""
For comparison, this is the PO file generated before the patch:
msgid "::::::::::::::::::::::::::::::::::::: challenge" msgstr ""
msgid "Challenge 1: Can you do it?" msgstr ""
msgid "What is the output of this command?" msgstr ""
msgid "paste(\"This\", \"new\", \"lesson\", \"looks\", \"good\")\n" msgstr ""
msgid ":::::::::::::::::::::::: solution" msgstr ""
msgid "Output" msgstr ""
msgid "[1] \"This new lesson looks good\"\n" msgstr ""
msgid ":::::::::::::::::::::::::::::::::" msgstr ""
I think having more `msgid` blocks will be significantly easier for translators.
Ok, then. Let's reopen this bug. What we will need is an option to alternate between fenced-div=verbatim (as I did) and fenced-div=translate (as you propose).
I still think that we need both because the translate behavior may lead to some subtle difficulties when a nested div is inlined. In that case, the translators may want to change the location of the nested div in the englobing sentence.
Thanks!
A few ideas... in the later case (fenced-div=translate), if the fenced div line will show up as a msgid
, perhaps include a translator note that it does not need to be translated? Another option may be my original work-around of not including fenced divs in the PO file at all (possibly related to #77).
@mquinson just checking in... is there anything I can do to help with this? (without knowing perl... sorry...)
This would be a great feature to have, especially because of the heavy use of fenced divs by Quarto, which is rapidly gaining popularity as a cross-language authoring system.
Hi @mquinson unless I'm missing something obvious, I think this should be re-opened because it does not provide an option to choose treating fenced divs as either "verbatim" or "translate".
As mentioned above, the currently implementation results in unnecessary markdown formatting (especially line breaks, \n
) showing up in the PO file.
Thanks!
I forgot everything about this issue since then, sorry. Feel free to reopen it it's appropriate, then.
Thanks for the re-open. Please let me know if there's anything I can clarify.
Actually, I'll go ahead and clarify a bit now:
Ideal behavior would be if the parsed text in the PO file accounted for all markdown formatting between fenced divs (detection of type: Title ##
, etc) as well as the divs themselves. But if that is too difficult, the option to ignore fenced divs as a work-around so that any markdown formatting between them gets properly detected would be OK too.
There seem to be some possibly related issues (#291, #357, #359), but I couldn't find anything describing exactly what I'm encountering, so I am filing a new one.
I am translating markdown with input like this (let's call this file
test-long-line.md
):Note that the line with many colons ending with
instructor
is a pandoc fenced div and needs to remain one one line, and should not be translated.I generate the PO file with
po4a-updatepo -f text -m test-long-line.md -p test-long-line.po -o markdown --wrap-po newlines
, then edit it to look as follows (call thistest-long-line.po
):When I translate from the PO file, the
instructor
part gets put on a new line, even though I want to avoid this behavior.Command:
Output:
I also tried with
nobullets
as suggested in #359, but that did not work.Thanks!
po4a dev version (4cc0afd96fbb4f2d6674f8259a7a9f7e900942d8) running in docker container joelnitta/po4a:latest