jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.69k stars 3.39k forks source link

Two-Backticks-math (Julia-math) extension for CommonMark in Pandoc? #8572

Open kellertuer opened 1 year ago

kellertuer commented 1 year ago

First of all thanks for all your work on Pandoc – its really nice and easy to use :)

In the Julia documentation (and hence in most of the Julia community) the $...$ form of typing inline math is not used, since $ is used for interpolation into strings, which especially in doc strings might lead to strange effects. Instead, two backticks are used, see The Julia Docs which is also available in CommonMark.jl

In my case (combining Julia documentation in Quarto) the two backticks currently get reduced to single backticks. Of course I can (and currently do) use the -tex_math_dollars extension and $ in my Markdown files within quarto. But if I could use two backticks, that would unify my style of writing within my repository, which is neat.

Would it be possible to have a -tex-math-2backticks extension? Otherwise and extension that leaves the number of backticks in inline code as is, would also suffice of course.

jgm commented 1 year ago

That's a badly thought-out extension, because two backticks already have an established markdown/commonmark meaning (literal text), and in fact you sometimes need two backticks (when the literal text contains a single backtick). Not inclined to support.

Of course, you can always use a raw attribute to pass through anything literally. It would be a bit cumbersome:

` ``\JuliaLaTeX`` `{=commonmark}
kellertuer commented 1 year ago

Thanks for your feedback, you are right, two-backticks has a meaning already in Non-Julia-Doc Markdown, especially if using backticks in code itself. On the other side the JuliaFlavour of Markdown uses this format for quite a while. So all I would like to have, maybe, is an extension that keeps two backticks as is then?

Currently, ok via Quarto->Jupyter->Pandoc->CommonMark text like

I am math ``\beta^4``

becomes

I am math `\beta^4`

And while the first is nice LaTeX-Julia-Stuff (with the double meaning you mentioned), I still would love to keep it as is.

So while my application is different (or reason why I would like to have that) but +keep-number-of-backticks-inline-as-is would also be an extension I am fine with (and sure I would kind of misuse it slightly).

Sure your solution works, but if the test is a little math-heavy (sorry, my software is a little “mathy”), it is really a little cumbersome to write.

jgm commented 1 year ago

There is no way to keep it as is. Once we parse ``\beta^4``, it is represented as a Code element in the AST, and this is rendered back using the minimum backticks needed (one in this case). Information about the concrete syntax isn't stored in the AST and isn't recoverable.

kellertuer commented 1 year ago

Hm, so the Julia-math-type extension will not happen due to the double-interpretation and the other idea will not happen, since Pandoc does not do that? That's a pity, that such a minor problem will make me probably not being able to use pandoc then.

Is there really no way to say “Ok, if you know what you are doing, we interpret all double backticks as Julia-CommonMark math”? If so – well – I have to see what to do. I had hoped (after all the work getting into Quarto and such) I finally had found a persistent way of writing my tutorials.

And no, with quite some math in my tutorials – they basically explain how the math theory works in the numerical realisation – the cumbersome way would really be a lot of more typing.

But if this is the conclusion, then I will have to look for something else and this can be closed I guess, sadly.

jgm commented 1 year ago

It's not impossible, and we can keep this open for a while. But I get grumpy about supporting badly thought-out ad-hoc markdown extensions.

If we did support it, we'd have to add a new extension, add something to the parser to handle it, and add complexity to the writer (since we could no longer use `` for code blocks that contain a single backtick when this extension was enabled). That all adds complexity, and complexity leads to bugs.

kellertuer commented 1 year ago

I did not think much about this problem, but I like the style it leads to when writing Julia docs.

But I see your problem with inline code that contains backticks for sure (one probably basically has to skip two backticks and do one-backtick-containing-code in three backticks). And I can understand that this is probably complicated and complex to realise (I did not think that when I started the issue).

So I can fully understand that that contributes to a grumpiness (but not at me please – I did not invent that). Still, the world is what it is – even with badly thought-out extensions – so sure, lets leave this open and if it is possible at least one person (me) would be happy.

Until then I will look for some tricks around that, because besides this issue I like what I learned about Quarto and Pandoc the last weeks.

tarleb commented 1 year ago

It seems that the Julia folks are not the only people who decided to use double backticks as special syntax: as I learned today, Swift's DocC uses it to mark symbols that should be linked, similar to [`someSymbol`][] in pandoc Markdown.

kellertuer commented 1 year ago

INteresting, for that I like Julia‘s Document.jl Syntax to use [`someSymbolOrfunction`](@ref)

jgm commented 1 year ago

Hm. Well, maybe we need another extension then.

jgm commented 1 year ago

Oh, but Julia and Swift use it for entirely different things.

jgm commented 1 year ago

One approach to this would be a custom reader and writer. (See the documentation on the website.) The custom reader could simply replace `` with $ and then send the result through pandoc's markdown parser. The custom writer could simply override the rendering of Math, leaving everything to pandoc's markdown writer. These would be relatively short Lua scripts.

kellertuer commented 1 year ago

Thanks for the comments – I do not have any experience in Lua, but maybe someone has time for this at some point.

For now it is also not urgent; I found a way to work, just that my markdown files are not partly using $ (the ones generated by quarto) or `` (the ones from Documenter / Julia Docs), which does not make a difference in the rendered HTML, just in the interims/committed Markdown code.

tarleb commented 1 year ago

I think this is roughly what jgm had in mind:

Extensions = pandoc.format.extensions 'commonmark'
Extensions.tex_math_dollars = true

function Reader (input, opts)
  local flavor = {
    format = 'commonmark',
    extensions = opts.extensions
  }
  local src = tostring(input):gsub('``', '$')
  return pandoc.read(src, flavor, opts)
end

Usage: save to file juliamd.lua, then call pandoc with pandoc --from=juliamd.lua .... The method is slightly crude in that it would lead to unexpected results if there is code that contains ``, but otherwise this should work. Not sure if there's a good way to integrate it into a Quarto workflow though.

kellertuer commented 1 year ago

Thanks, I will have to check how I can tweak Quarto maybe to use this – as soon as I find time. Probably the a-little-bit-safer way is a regexp to check for ``[math-text-here]`` code?

kellertuer commented 1 year ago

I was not zet able to introduce this into Quarto – but it also just remains an inconsistency in my (rendered) Documenter files, where the way to write math is different depending on the origin of the file.