siefkenj / unified-latex

Utilities for parsing and manipulating LaTeX ASTs with the Unified.js framework
MIT License
85 stars 20 forks source link

Default arguments support #24 #62

Open theseanl opened 7 months ago

theseanl commented 7 months ago

The goal of this PR is to add support for macros with default arguments in macro expansion, e.g. as used in CLI's -e option. This should fix #24.

This PR mostly modifies unified-latex-util-macros package. Also the argspec parser was modified to correctly provide default arguments. In doing so, I have simplified the AST by removing Group and always returning an array of strings. Some complexity were moved to the argspec serializer. Previously default arguments and embellishment tokens were encoded in the same Groups, but it seemed to me that they need to behave differently in several aspects, so now in pegjs they are represented as different terms group and collections. This irons out several places which only had supported specifying a single character, but actually should have supported arbitrary token. Also https://github.com/siefkenj/unified-latex/issues/46 was fixed.

theseanl commented 7 months ago

On a second thought, I am uneasy about attaching default args to AST, I'll think about moving that under createMacroExpander. What do you think?

If we choose that way, I may leave attachMacroArgs intact during this PR, and I don't see much point on working on expandMacros to support default arguments, because that will anyway require an overhaul while working on https://github.com/siefkenj/unified-latex/issues/43...

siefkenj commented 7 months ago

This is exciting :-). I will review it soon.

Just a comment, though. _renderInfo is the place to stick data that isn't "part of the AST" but that you still want hanging around. It would be a reasonable place to stick default arguments. (It's already where named arguments are placed)

siefkenj commented 7 months ago

Would you be able to make cross-spawn its own PR? That should be a fairly quick change :-)

theseanl commented 7 months ago

Would you be able to make cross-spawn its own PR? That should be a fairly quick change :-)

Sure, done at https://github.com/siefkenj/unified-latex/pull/63

theseanl commented 7 months ago

I need to get this going, and I propose to ditch Typescript projects in favor of lerna. Typescript project not allowing circular dependency is the problem. Lerna allows us to do everything that Typescript project does.

In the long term, I see benefits of disallowing circular dependencies, and in order to implement contextual macro expansion https://github.com/siefkenj/unified-latex/issues/43, I anyway see the needs for some architectural change. However, it'd be much better if we can do it incrementally, and introducing circular dependencies among projects in the meantime would be inevitable. There are many testimonies in https://github.com/microsoft/TypeScript/issues/33685 that some had to abandon TS projects because there is no workaround for circular dependency issues.

I can make a separate PR that does this, merge that here and then make this PR to a state that is reviewable.

siefkenj commented 7 months ago

@theseanl I am strongly against circular dependencies. I think refactoring things into their own package is the solution most of the time.

Why do you think a circular dependency is required?

theseanl commented 7 months ago

I didn't say that it is required. What I think is that attempting such a refactoring you are talking about is likely beyond the scope of this PR, and will slow down the development process.

The reason for this is that in order to parse default arguments, we need unified-latex-util-parse's parse function, and it is a gigantic function that invokes various other packages. Deferring invoking parse during macro expansion phase (which you have suggested) does not eliminate circular dependencies, because such an invocation will anyway be placed on unified-latex-util-macro package, which unified-latex-util-parse depends on.

IMO, we anyway will need some major architectural change to implement contextual macro expansion, and it will require passing down some config or host objects all the way down to a lot of places, including gobbleArguments function. There is already a place where we invoke the plain parse function somewhere (I forgot where exactly it is) and this is fundamentally flawed; Any configuration the user have provided to the parser need to be provided to that parse call as well. Such an issue will be resolved during such a refactoring. We will likely only import interfaces of such objects and thereby achieving loose coupling of packages and likely also an acyclic dependency graph.

I am not sure about your plans for tackling those problems, but I would first focus on default arguments support, and the then to contextual macro parsing which is likely to resolve the tight coupling issue in the meantime.

This package is a nodejs module. IMO circular dependency is not something that is tabooed in Javascript environment, ES modules specification is carefully defined so that it handles circular dependencies in a graceful and predictable way. I don't see a reason to avoid circular dependencies.

theseanl commented 7 months ago

Frankly, I think the root cause is that we are re-parsing argspecs after stringifying it. It'd be better if we are directly operating on AST to extract argspecs. Not only that it will make the need to re-parse default arguments unnecessary, but would also make supporting argument processors like > { \SplitArgument { 2 } { , } } m easier.

siefkenj commented 7 months ago

It seems appropriate to split expandMacros into its own package so that including unfied-latex-util-macros doesn't create a circular mess :-). That could be done in a separate PR or in this one (though would certainly be easiest to review in a separate PR).

theseanl commented 7 months ago

That indeed seems a quick and easy solution to our situation without getting my hands dirty with cleaning up all the mess :)

theseanl commented 7 months ago

Based on new observations above, I've simplified the pegjs part a bit and worked on gobbling until arguments. While doing so, I thought I could support multiple stop tokens, so I did it. It is currently 'dirty' in that it may introduce unnecessary string splits while doing stop tokens matching. Having extraneous string split should be harmless, but we may anyway 'clean' it if needed...

It currently lacks support for default arguments referencing other arguments, such as m o{#1}. This would be a goal of the next iteration.

It currently does have circular dependency. I'll try to see if it can be removed in next iterations, but I'm not sure if it will be done. The other PR should have removed any build-related issues that may arise from circular dependency.

theseanl commented 7 months ago

Implemented support for default arguments referencing other arguments! It turned out to be quite tricky, but in the end it boiled down to a simple algorithm. Now I think this PR is feature complete, ready to be reviewed.

siefkenj commented 7 months ago

Now that #67 is merged, this should be easier to review :-)