Closed Demian101 closed 7 months ago
Hi @Demian101, thanks for reporting this! I love LaTeX, so I would like to see this working :smile:
First, let me add a test which demonstrates how the backslashes are handled: #108.
This shows that the round-tripping might be surprising: when you enter \
in your document, the translation sees \\
since this is an equivalent way of entering a backslash. So if the mdbook-katex
preprocessor doesn't understand this, you end up with a problem.
I haven't looked at how mdbook-katex
works yet, but perhaps you could start by looking in their issues and documentation to see if they talk about how they handle backslashes?
Thanks/appreciate for your reply ~ ,
so I understand what you mean:
Backslashes
, converting a single \
to \\
is a standard behavior, so
$$ \\\\begin{array}{|c|c|c|c|c|} \\\\hline 1 & x_1 & x_2 & x_3 & out \\\\ \\"
"\\hline 0 & 1 & 1 & 0 & 0 \\\\ \\\\hline \\\\end{array} $$
is a correct (or at least compliant with markdown specification) way of processing.
The core question is :
mdbook-katex
needs to adapt and handle this way(above) and render it correctly
I understand this is what you meant in your reply, Am i on the point? 🤣
The core question is :
mdbook-katex
needs to adapt and handle this way(above) and render it correctlyI understand this is what you meant in your reply, Am i on the point? 🤣
Yes, I believe you got it correctly.
The problem is that multiple different Markdown documents can give the same result. The two paragraphs here are identical after parsing the Markdown:
\x
\\x
Check it in the commonmark.js playground by clickign the AST tab which says
<paragraph>
<text>\</text>
<text>x</text>
</paragraph>
<paragraph>
<text>\</text>
<text>x</text>
</paragraph>
for this example.
When we parse either paragraph for translation, we get a Rust string with r"\x"
(backslash-x, 2 bytes), but when we turn it back into Markdown, we end up with r"\\x"
(backslash-backslash-x, 3 bytes). We then save that to the PO file, which triggers another round of escaping so that you end up with \\\\x
(5 bytes) in the PO file.
Now, you should not edit the PO file directly: use a PO editor instead. There are several online ones or you can install Poedit locally. When it displays the PO file, it will unescape it and show you \\x
. But it's still annoying and confusing that there are "extra" backslashes like this.
I think fixing this would actually require a change in pulldown-cmark-to-cmark, which is the crate we use to turn the Markdown AST into Markdown text. The fix would be to use \x
instead of \\x
when this is the same according to the CommonMark spec. It's not always possible, though! Your input above shows such an example: when you have \\
in a Markdown file, you're actually entering a single logical \
because you're escaping the backslash.
It's all a bit ambigious and I would be curious to hear how mdbook-katex
deals with this. Thanks for creating the issue there, I'll go subscribe to it now.
I think fixing this would actually require a change in pulldown-cmark-to-cmark, which is the crate we use to turn the Markdown AST into Markdown text. The fix would be to use
\x
instead of\\x
when this is the same according to the CommonMark spec. It's not always possible, though! Your input above shows such an example: when you have\\
in a Markdown file, you're actually entering a single logical\
because you're escaping the backslash.
I think I was wrong here: it should be fine that pulldown-cmark-to-cmark turns \x
into \\x
in the Markdown text: the next step in the process won't be able to to tell. In particular, the final HTML output will only contain \x
(2 bytes) since that is what \\x
means in Markdown.
I created https://github.com/Byron/pulldown-cmark-to-cmark/issues/60 to describe the idea of emitting a simpler escaped form. Both would be correct, but translators will have an easier time working with $\sqrt{\frac{1}{x}}$
instead of $\\sqrt{\\frac{1}{x}}$
.
I've now learnt that mdbook-katex uses the raw Markdown input: https://github.com/lzanini/mdbook-katex/issues/100#issuecomment-1780611579.
This suggests a different approach: @Demian101 can you try adding configuring mdbook-katex
to run before both mdbook-xgettext
and mdbook-gettext
? I believe you can do this with this configuration in your book.toml
:
[preprocessor.katex]
after = ["links"]
before = ["gettext"]
The goal is to
mdbook-katex
, both when you output HTML and when you extract messages with mdbook-xgettext
. You should no longer see equations in your PO files: instead you might get the HTML that I believe mdbook-katex
inserts.mdbook-katex
before you do the translation with mdbook-gettext
.I think this should work, but you will lose the ability to translate the math. Let me know what you find out.
Thanks a lot for your great support! I tried and here're the conclusions:
my book.toml:
[preprocessor.katex]
after = ["links"]
just like before, can't render
Tip: the $\color{brown}brown$ block already looks like a latex formula that is actually ready to render, the only problem is that the
\\
used for line breaks has been changed to\
(the red line i marked)Here what I mean is, if we get the following form of the picture below, can it be successfully rendered? (I guess) there may be some subtle problems hidden here 🤣
$\begin{array}{|c|c|c|c|c|} \hline 1 & x_1 & x_2 & x_3 & out \\ \hline 0 & 1 & 1 & 0 & 0 \\ \hline \end{array}
$
my book.toml: (Add before = ["gettext"]
)
[preprocessor.katex]
before = ["gettext"] <------ Attention here
after = ["links"]
Amazing! the annoying Latex works ~
but 🤣 , The problem is like above: All sentence with inline Katex is not rendered.
I speculate that: if there is inline Latex like $xx$ in a sentence, then the entire sentence is not processed by gettext
(maybe?)
I built a minimal demo for you to have a try ~
you can just:
git clone https://github.com/Demian101/Demian101.github.io
cd Demian101.github.io
MDBOOK_BOOK__LANGUAGE=en mdbook serve -d book/en
In this demo, the en.po
file and message.pot
are almost empty.
but the render of the formula when generating ./en
folder failed.
so I think there is s.th. happened when gettext processing the raw .md ...
Or you can provide the source code of gettext, and I will try to fix it. What exactly happened when MDBOOK_BOOK__LANGUAGE=en mdbook serve -d book/en
you can just try to comment the after = ["gettext"]
in book.toml to see what happened .
Hey @Demian101, thanks for documenting this! I don't have much time to look at this myself, but I've asked around internally and perhaps someone else will find the time to work on it.
One idea: mdbook
has a Markdown output format which you should try enabling. See Configuring Renderers. That ought to show you in more detail how things are transformed.
Hi @Demian101, I took a look at your repro repo. Not sure if I understood it correctly since you seem to have pushed some more commits after the last time you left a comment.
I tried to create a smaller POC for testing how stuff works and got it working at https://github.com/kdarkhan/mdbook-i18n-and-katex
The Github pages version is available here.
I believe you might have had your stuff broken because your PO files here were not re-generated after you updated mdbook-gettext
/ mdbook-katex
execution order.
For instance, I found this msgid which should not be there. I think you mentioned that for inline latex, your translations stopped working. Reason for that could be because with katex
executed earlier, msgid
s were updated and no longer matched the older version you had.
Based on my testing, when I run mdbook-gettext
after mdbook-katex
, gettext
only sees generated MathJax
nodes which are not original Latex.
The same latex table you had becomes a MathJax HTML in PO file.
If desired, Latex blocks could be skipped as I did here.
Let me know if I missed something.
Thanks @kdarkhan for looking into this!
@Demian101 I will resolve this bug. Feel free to reopen if you are still facing this issue.
I think it is a
xgettext
BUG:when I run
to generate message.pot file :
My source
.md
fileWhen Convert to
message.pot
file:As a result, you can see so many Backslash!!! rendering cannot be (mdbook-katex) performed!!!