siefkenj / unified-latex

Utilities for parsing and manipulating LaTeX ASTs with the Unified.js framework
MIT License
85 stars 20 forks source link

Add `listings` and `minted` package support (incl. verbatim macro) #37

Closed James-Yu closed 1 year ago

James-Yu commented 1 year ago

This PR resolves #35

This PR adds the \lstinline macro from listings package to PEG.js as a verbatim macro. Following the discussion in https://github.com/siefkenj/unified-latex/issues/35#issuecomment-1585037810, this is the only way that prevents unified-latex from parsing contents in the macro, which should be verbatim.

siefkenj commented 1 year ago

Can you add some tests to this?

James-Yu commented 1 year ago

Here it goes!

siefkenj commented 1 year ago

Thanks! I am away for the next week, but I will look at this when I get back.

jlelong commented 1 year ago

HI @siefkenj, I am helping @James-Yu for the maintenance of LaTeX-Workshop and I also am the maintainer of https://github.com/jlelong/vscode-latex-basics which provides built-in LaTeX syntax highlighting for VS Code.

I think the following two commands provided by the minted package should also be treated as \lstinline: \mint and \mintinline. They accept the following syntax \mintinline[⟨options⟩]{⟨language⟩}⟨delim⟩⟨code⟩⟨delim⟩. The delimiters can be a single repeated character, just like for \verb. They can also be a pair of curly braces {}.

We may also consider supporting the pythontex package for an other PR, see the discussion https://github.com/James-Yu/LaTeX-Workshop/issues/2542#issuecomment-786692920

James-Yu commented 1 year ago

@jlelong Here it goes with support to \lstinline, \mintinline, and \mint 😄

siefkenj commented 1 year ago

These macros are pretty sophisticated. I think we need to take a different approach, since the current one doesn't do any parsing of the optional arguments. Looking at the code, it appears that things like comments inside the optional arguments will be parsed as strings, among other incorrect things, like \lstinline[foo={]}]!...! not being parsed correctly. Also, the PEG.js grammar currently doesn't return any nodes of type argument; that is all left up to the ctan/packages.

Here's my proposal:

  1. Make a one_square_bracket_args rule that is similar to verbatim_option but matches token instead of .. This rule wouldn't produce any group, just return an array with with [{type: "string", content: "["}, ..., {type: "string", content: "]"}
  2. Make a verbatim_group rule similar to the existing one, but make it return a group with a single string of content.
  3. Make a verbatim_delimited_by_char rule that returns a flat array of three {type: "string",... } objects With those parsing rules, we can add the special exceptions for the \lstinline and friends. Then, add a ctan package that parses the arguments as usual. Since the grammar should have already parsed everything that needs to be verbatim as strings, everything should work out :-)
James-Yu commented 1 year ago
  1. Make a one_square_bracket_args rule that is similar to verbatim_option but matches token instead of .. This rule wouldn't produce any group, just return an array with with [{type: "string", content: "["}, ..., {type: "string", content: "]"}
  2. Make a verbatim_group rule similar to the existing one, but make it return a group with a single string of content.
  3. Make a verbatim_delimited_by_char rule that returns a flat array of three {type: "string",... } objects With those parsing rules, we can add the special exceptions for the \lstinline and friends. Then, add a ctan package that parses the arguments as usual. Since the grammar should have already parsed everything that needs to be verbatim as strings, everything should work out :-)

Quite challenging to me! Will follow the instructions shortly.

James-Yu commented 1 year ago

I'm having a difficulty in the step "add special exception for the \lstinline and friends". It seems not possible to return an expanded array or backtracing in PEG grammar. My current rule for \lstinline is

verbatim_listings "verbatim_listings"
    = escape
        macro:"lstinline"
        option:square_bracket_argument?
        verbatim:(verbatim_group / verbatim_delimited_by_char) {
            return [createNode("macro", { content: macro }), ...option, ...verbatim]
        }

This is generating (for \lstinline[t]#code$#)

{
  type: 'root',
  content: [
    [
      {
        type: 'macro',
        content: 'lstinline'
      },
      {
        type: 'string',
        content: '['
      },
      {
        type: 'string',
        content: 't'
      },
      {
        type: 'string',
        content: ']'
      },
      {
        type: 'string',
        content: '#'
      },
      {
        type: 'string',
        content: 'code$'
      },
      {
        type: 'string',
        content: '#'
      }
    ]
  ]
}

which is obviously wrong as the content for root is an array of array of nodes. Here the inner array seems not flattened.

Do you have a suggestion on how to handle this issue? Or was I taking a wrong route?

siefkenj commented 1 year ago

This is good progress! Try adding a flatMap to the result of token* from root. (Actually any place that looks for token will need to handle the case of an array now, since token is no longer a single token)

siefkenj commented 1 year ago

This is starting to look pretty good :-). Can you fix the broken tests and also add some tests of weird stuff, like

` \lstinline[foo %bar

]{my code} ` and make sure that the comment parses as a comment and the newline is interpreted as a parskip.

This PR also needs the companion parts in ctan to attach arguments correctly.

James-Yu commented 1 year ago

Figured out how to use existing argument parsers! More tests pending.

In the mean time, I don't think either d${d}${d} or m alone can cover both the #code# and {code} case, tested. Therefore, I still use a custom argument parser for these verbatim macros.

James-Yu commented 1 year ago

Done!

siefkenj commented 1 year ago

Thanks for this! I will release a new version :-)

James-Yu commented 1 year ago

So many thanks! I am so glad this contribution get accepted!