siefkenj / unified-latex

Utilities for parsing and manipulating LaTeX ASTs with the Unified.js framework
MIT License
85 stars 20 forks source link

\\ macro should be parsed differently in some math environments #40

Closed theseanl closed 1 year ago

theseanl commented 1 year ago

Steps to reproduce

  1. Compile below with latex, then inspect output
    
    \documentclass{article}
    \usepackage{amsmath}

\begin{document}

1 \ [10pt] 2

\begin{gather} 1 \ [10pt] 2 \end{gather}

\end{document}

2. Parse it with unified-latex (e.g. by pasting it to https://siefkenj.github.io/latex-parser-playground/)

## Expected behavior

In the first `1 \\ [10pt] 2` line, `[10pt]` is treated as an argument for the `\\` macro.
However, in the second occurrence, [10pt] is rendered directly to the output. Only after removing a whitespace between `\\` and `[10pt]` it is treated as an argument for `\\`.

Thus, the parsed AST should treat two cases differently.

## Actual behavior

`unified-latex` currently parses [10pt] as an optional argument for `\\` in both cases.

It seems that this behavior is specific to `gather` environment, because for `eqnarray*`, `[10pt]` is always treated as an argument for `\\`,
```tex
\begin{eqnarray*}
1 \\ [10pt] 2
\end{eqnarray*}
% Above and below produce the same output
\begin{eqnarray*}
1 \\[10pt] 2
\end{eqnarray*}

which seems to be a sensible behavior. I hit this with tex codes having Lie brackets right after a line break. I'm wondering if there's a central source of truth for such a subtle parsing behavior - where can I find how exactly gather* environment modifies it?

siefkenj commented 1 year ago

Hmmm, I don't completely understand your comment, but I thought I had made it so that whitespace prevented \\ from consuming an optional argument... It's definition is here:

https://github.com/siefkenj/unified-latex/blob/622ab411bc1e163524dcdc299ab1673e35fece10/packages/unified-latex-ctan/package/latex2e/provides.ts#L10

It seems this behavior is tested here: https://github.com/siefkenj/unified-latex/blob/622ab411bc1e163524dcdc299ab1673e35fece10/packages/unified-latex-util-arguments/tests/gobble-single-argument.test.ts#L475

So I am not sure where it's going wrong...

theseanl commented 1 year ago

I found the following relevant excerpt in xparse documentation page 4:


There is one subtlety here due to the difference in handling by TEX of “control symbols”, where the command name is made up of a single character, such as “\”. Spaces are not ignored by TEX here, and thus it is possible to require an optional argument directly follow such a command. The most common example is the use of \ in amsmath environments. In xparse terms it has signature

\DeclareDocumentCommand \\ { !s !o } { ... }

According to it, \\'s signature should change from !s o to !s !o when it is inside amsmath environments, which is consistent with my initial observation in the first post, and there may be more macros having similar behaviors.

siefkenj commented 1 year ago

Good find! #41 fixes this issue. I'll release a new version when the tests pass.

theseanl commented 1 year ago

I am not sure if the fix is correct. The signature is not globally !s !o, it is only so inside certain amsmath environments. In rest of the cases, it has to be !s o. It seems that the linked PR globally changes the signature. I would say that the previous behavior is closer to the expected behavior.

The current infrastructure doesn't seem to allow signatures to change based on neighboring environment, so I guess it won't be a simple fix.

siefkenj commented 1 year ago

Yes, currently macros are defined globally. It's possible to let an environment redefine the macros it uses (see the tikz package), but it's annoying. I think most people don't even know that you can do \\ [4pt] in normal LaTeX, so I don't think it's too much of a loss.

theseanl commented 1 year ago

It seems that tikz code is pretty specialized to that case. In general, macros are only available to an enclosing group and IMO it is an essential feature that allows basic encapsulation, but unified-latex currently treats every macro as global. If tikz codes can be adopted to support macro scopes then it would be great.

siefkenj commented 1 year ago

That's a good point. Would you mind opening a new issue for per-environment macro overrides?

theseanl commented 1 year ago

Well, I don't have resource to work on it, so I'd rather not "own" the issue. It is actually a separate issue, per-group macro overrides and per-environment macro overrides, but perhaps these could be dealt in one go.