jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.69k stars 3.39k forks source link

Handling of ```{lang} codeblocks, change from pandoc 2.19.2 to pandoc 3.1 #8645

Open cscheid opened 1 year ago

cscheid commented 1 year ago

The treatment of this type of code syntax has changed from pandoc 2 to 3:

$ cat repro.md
```{python}
x = 2

In pandoc 2.19.2:

$ pandoc2 --version pandoc 2.19.2 Compiled with pandoc-types 1.22.2.1, texmath 0.12.5.2, skylighting 0.13, citeproc 0.8.0.1, ipynb 0.2, hslua 2.2.1 Scripting engine: Lua 5.4 User data directory: /Users/cscheid/.local/share/pandoc Copyright (C) 2006-2022 John MacFarlane. Web: https://pandoc.org This is free software; see the source for copying conditions. There is no warranty, not even for merchantability or fitness for a particular purpose. $ pandoc2 -f markdown repro.md -t native [ CodeBlock ( "" , [ "{python}" ] , [] ) "x = 2" ]


In pandoc 3.1:

$ pandoc3 --version pandoc 3.1 Features: +server +lua Scripting engine: Lua 5.4 User data directory: /Users/cscheid/.local/share/pandoc Copyright (C) 2006-2023 John MacFarlane. Web: https://pandoc.org This is free software; see the source for copying conditions. There is no warranty, not even for merchantability or fitness for a particular purpose. $ pandoc3 -f markdown repro.md -t native [ Para [ Code ( "" , [] , [] ) "{python} x = 2" ] ]


The change is particularly hard on quarto when there's a line break:

$ cat repro2.md


x = 2

$ pandoc3 -f markdown repro.md -t native [ Para [ Str "{python}" ] , Para [ Str "x" , Space , Str "=" , Space , Str "2" , SoftBreak , Str "" ] ]



We use the `{lang}` and `{{lang}}` syntax pretty extensively in [quarto](https://quarto.org). We could work around it by patching the input markdown around pandoc 3, but it would really be great if we didn't have to.

Was this change deliberate? We didn't see anything on the changelog that suggested that, hence our question here.

Thanks!
cscheid commented 1 year ago

In a conversation with @tarleb, he's narrowed the change to https://github.com/jgm/pandoc/commit/8670f6dc5b943a6e7eabaf69776724d54d80c9bb

jgm commented 1 year ago

I understand why this changed, but I think this is a case where the old behavior was a bug.

jgm commented 1 year ago

To elaborate: {python} is not a valid attribute specifier. {.python} would be fine; that specifies python as a class.

Previously, pandoc was quite lax in what it accepted as a language identifier in the

```ruby
xxx
form.  It even accepted `{python}`, as in your case.  This isn't really a good feature, because `{python}` isn't what you want for the language name; what you want is `python`.  In 3.0 we started allowing *both* a language identifier and a regular attribute specifier, e.g.
xxx

This required us to be more picky about what we treated as a language identifier; we had to be sure we weren't gobbling the attribute specifier, so we excluded `{` and `}` from language identifiers. I think this is fine because there are no language names that include these characters.
jgm commented 1 year ago

Bottom line: what you should write is

``` python

or

``` {.python}
tarleb commented 1 year ago

One reason why I'd like to support this is that it is close to the "directive" syntax in MyST. See also https://github.com/jgm/commonmark-hs/issues/100

jgm commented 1 year ago

Well, this would be a change to the syntax of attribute specifiers. We could consider it, but it should be proposed on a separate issue and discussed there. This issue is about a putative regression, which I'd say is not really a bug.