siefkenj / unified-latex

Utilities for parsing and manipulating LaTeX ASTs with the Unified.js framework
MIT License
85 stars 20 forks source link

Space after argument-less macro should be absorbed #56

Open Evan-Zhao opened 9 months ago

Evan-Zhao commented 9 months ago

Hi @siefkenj, thank you for the library! I'm working with the parser and noticed extraneous whitespaces being given in the following case:

{\em 5 stars}

which in Latex should render as 5 stars without any space before it. I had to look it up; this SO question confirms that in such positions (macro without arguments and not guarded by a {} empty statement) the following space is absorbed and doesn't render. Currently the unified-latex parser acts differently and produces that whitespace.

Is this intentional? How should I approach and fix it if I'd like to contribute?

Context: I'm using this Latex parser to "transpile" a small set of Latex into other typesetting languages such as Typst. It's very hard to have great Latex feature coverage, and I don't quite intend to do so. Still, it may prove useful for people to migrate their old latex projects and save some manual work :)

rowanc1 commented 9 months ago

Hi @Evan-Zhao -- saw your Context, and responding to that! We are doing something similar over in https://mystmd.org, and it may be helpful to take a look at the packages we are working on there (tex-to-myst, and myst-to-typst - demos in the online docs) -- let me know if you are interested in hearing more on how we are working through that translations with unified-latex!

siefkenj commented 9 months ago

At the moment, this behavior is intentional, since unified-latex was first written to be a pretty printer (and so intended to preserve formatting of the code, not exact TeX behavior).

There are a few things you can do. If your macro takes an argument, give it a signature of "m", and the argument will be absorbed, correctly accounting for whitespace. In the case of \em it is a streaming command which doesn't take an argument. Things like \em 5 is the same as \em5, but \em A is not the same as \emA...

In any case, if you want to remove whitespace immediately following specific macros, you can use the replaceNode command and check for whitespace. Then you can see if the item in the containingArray immediately preceeding the whitespace is a macro. If so, return null from replaceNode.