siefkenj / unified-latex

Utilities for parsing and manipulating LaTeX ASTs with the Unified.js framework
MIT License
85 stars 20 forks source link

erroneous `openMark`/`closeMark` on argument parse #76

Closed retorquere closed 6 months ago

retorquere commented 6 months ago

with this:

import { unified } from 'unified'
import { unifiedLatexFromString } from '@unified-latex/unified-latex-util-parse'

const parser = unified().use(unifiedLatexFromString, {
  macros: {
    test: { signature: 'm' },
  },
})

console.log(JSON.stringify(parser.parse('\\test m\\test{m}'), null, 2))

both arguments get openMark={ and closemark=}.

siefkenj commented 6 months ago

This is expected behavior. Mandatory arguments are surrounded by {...}. unified-latex parses to an abstract syntax tree rather than a concrete syntax tree. Information about whether the user included curly braces around the input or not is lost.

retorquere commented 6 months ago

Mandatory arguments are surrounded by {...}

That's not true though? The braceless form works and renders to the same in LaTeX.

In bibtex there can be a difference in sentence casing depending on whether content is in braces under some conditions. I'll investigate to be sure that it also applies to macro arguments. I understand the point about CST vs AST, but if the casing difference applies, those two forms have different meaning in bibtex, not just different expressions of the same meaning.

siefkenj commented 6 months ago

That's not true though? The braceless form works and renders to the same in LaTeX.

Yes. In latex the commands \foo x and \foo{x} are treated the same, and unifiex-latex doesn't distinguish. Check bibtex. If it does something different, that's very strange. You can look at the xparse documentation for other types of argument signatures that you can try.

retorquere commented 6 months ago

It isn't strange to anyone using bibtex. It is a necessary part for this tool chain. I appreciate you have a different perspective, but it's not the whole truth about the latex ecosystem, of which bibtex is a material part. I know these behave the same in the document body, but bibtex has additional behavior you wouldn't encounter writing the body.

I think I'm sensing some frustration on your side; I'm not telling you how to run your project, I'm just telling you that there are parts of the latex ecosystem that behave in ways you're not familiar with.

If I'm crossing a line here I'd like to know; I will need help to grok unified-latex, and if you find the way bibtex uses latex off-putting, that's going to be a strained interaction, and I don't think we should do that.

So long story short, I will likely need this distinction (I haven't completed my tests to see whether I do), and if that's not in the cards for unified, better to know it now rather than later, so I can refocus.

retorquere commented 6 months ago

Specifically, this

Yes. In latex the commands \foo x and \foo{x}

may not be true for bibtex. In bibtex,

{\textbf{x}}

and

\textbf{x}

are not the same.

siefkenj commented 6 months ago

You'll notice that the first parses to group(macro(x)) while the second parses to macro(x).

I am getting a bit frustrated because it seems like many of the questions you ask could be answered by reading the comments in the source code/examining the options available to each function. There is now fully source documentation up on https://siefkenj.github.io/unified-latex You can see that MacroInfo accepts an argumentParser, which can be used to do custom parsing if you need to.

retorquere commented 6 months ago

I can sense that frustration, but I find unified-latex pretty overwhelming. Even knowing what parts to look for is a hurdle.

What I meant ti say was that \textbf{x} itself means something different based on whether it is in braces, and that it affects it's neighbours in a group under some circumstances. But I've just tested a simpler sample -- \textbf{C} and \textbf C do not mean the same thing in bibtex. This is not "strange".

I'll try the rest by navigating the sources, but I find it equally frustrating having to walk on eggshells trying to eke out information. A lot I have already discovered on my own, but that is going to be invisible for you. But in the case you mention -- I don't know how I would have stumbled upon that MacroInfocan do these kinds of things, because the samples (which was all documentation there was until I generated typedoc) only show one particular use.