zoni / obsidian-export

Rust library and CLI to export an Obsidian vault to regular Markdown
Other
988 stars 84 forks source link

Broken escaping of square brackets in math #14

Open mattchrlw opened 3 years ago

mattchrlw commented 3 years ago

Hi there, this program has been immensely useful for getting my Obsidian docs into LaTeX, so thank you 🙏.

I've noticed that certain blocks of mathematics (in LaTeX) are handled incorrectly, breaking the math environments. For example, if you include a left-bracket in a LaTeX environment (such as $[0, 2\pi)$), after obsidian-export is run it will be converted to $\[0, 2\pi)$ (which is invalid).

I assume this is something to do with [[Wikilink]] parsing, but I don't know enough Rust to investigate 😅 I think a possible solution would be to ignore any parsing or modification inside math blocks and instead treat them verbatim.

image

zoni commented 3 years ago

I assume this is something to do with [[Wikilink]] parsing, but I don't know enough Rust to investigate sweat_smile I think a possible solution would be to ignore any parsing or modification inside math blocks and instead treat them verbatim.

It's unfortunately a little bit more complex than that.

I use pulldown-cmark to parse the Markdown document, which for this input:

# Heading 1

## Heading 2

### Heading 3

> Single line quote.

---

> Multi-paragraph quote, line 1.
>
> Multi-paragraph quote, line 2.
> Multi-paragraph quote, line 3.

Generates these events:

Start(Heading(1))
Text(Borrowed("Heading 1"))
End(Heading(1))
Start(Heading(2))
Text(Borrowed("Heading 2"))
End(Heading(2))
Start(Heading(3))
Text(Borrowed("Heading 3"))
End(Heading(3))
Start(BlockQuote)
Start(Paragraph)
Text(Borrowed("Single line quote."))
End(Paragraph)
End(BlockQuote)
Rule
Start(BlockQuote)
Start(Paragraph)
Text(Borrowed("Multi-paragraph quote, line 1."))
End(Paragraph)
Start(Paragraph)
Text(Borrowed("Multi-paragraph quote, line 2."))
SoftBreak
Text(Borrowed("Multi-paragraph quote, line 3."))
End(Paragraph)
End(BlockQuote)

Then I do some logic to replace [[WikiLinks]] correctly and feed the above stream of events to pulldown-cmark-to-cmark to convert this into Markdown text again.

There are some complexities here:

  1. pulldown-cmark doesn't understand LaTeX math blocks, so we're treating this as regular pieces of text.
  2. pulldown-cmark-to-cmark escapes any lone [ character (so it becomes \[) as part of its text encoding, because if it doesn't escape this, it denotes the start of a link.

What pulldown-cmark-to-cmark is doing is technically correct, though I think if it simply didn't do this, you'd get the correct behavior you're expecting.

We might be able to convince the pulldown-cmark-to-cmark authors to make this behavior configurable. I don't feel like doing this right now, but I'll file an issue on their repository somewhere in the next couple of weeks and explain this use-case to see how they feel about that.

mattchrlw commented 3 years ago

Ahh, that's a shame. For what it's worth, this problem can easily be remedied after the fact with some search-and-replace, as the problematic character sequences are quite predictable ($\[ -> $[ etc.) But I can see how this can become quite complex, even if it is made configurable...

Thanks for investigating anyway.

Masstronaut commented 2 years ago

I've just encountered a similar issue with escaped % characters (\%) in math sequences. In my case the reverse is happening - \% in my source notes becomes % in the output from obsidian-export. This is problematic as the % character is a comment in TeX syntax, so in an inline math block like $5\%$ that gets output as $5%$ the TeX parser sees $5 followed by the % token which marks the rest of the line - $ - as a comment. This breaks the syntax as the parser sees no closing $.

I've also tried using \\% in my input file, which obsidian-export outputs correctly as \\%, so it seems there is some special handling of \% going on.

Is this likely caused by the same upstream dependency, or do you think something else could be at play here?

JonasDoesThings commented 1 year ago
> [!NOTE] Test
> Test

also gets exported as

 >
 > \[!NOTE\] Test
 > Test

which breaks the callouts/admonitions when further processing the markdown in e.g. mkdocs :(

zoni commented 1 year ago

There's now a PR on pulldown-cmark to parse math blocks: https://github.com/raphlinus/pulldown-cmark/pull/622

If and when that is accepted and merged, and rendering support is also added to pulldown-cmark-to-cmark (which should be straightforward to implement) then the math rendering/escaping problems in this issue will be fixed.

maneesh29s commented 1 month ago

Today I faced same issue while exporting latex, where my [ and * were being replaced with \[ and \* respectively.

Today I saw that there have been some merges in pulldown-cmark, particularly #734 which closed the issue #622 mentioned by zoni above.

Can this issue be fixed now?

maneesh29s commented 1 month ago

After a little more investigation, here's my findings:

pulldown-cmark

The math mode has been integrated in the latest v0.11.0 release of pulldown-cmark. But there have been many breaking changes since v0.9.3 (current dependency of obsidian-export), for example in Event and Tag enums.

pulldown-cmark-to-cmark

The latest version of pulldown-cmark-to-cmark is v13.0.0, which still depends on pulldown-cmark 0.10.0. That means pulldown-cmark-to-cmark does not support math mode yet.

maneesh29s commented 5 days ago

Update:

pulldown-cmark-to-cmark has been updated to v15.0.1 which has added support for the math mode in pulldown-cmark

The renovate bot has tried to update these dependencies (see #259 and #252), but both have failed pipelines.

zoni commented 4 days ago

pulldown-cmark 0.10 introduced quite a few breaking changes. It's not terribly difficult to get those upgrades done, it's just a chore I haven't had the energy/motivation for yet :smile: It's certainly on my list to action in the future though