siefkenj / unified-latex

Utilities for parsing and manipulating LaTeX ASTs with the Unified.js framework
MIT License
85 stars 20 forks source link

So cool! #21

Open rowanc1 opened 1 year ago

rowanc1 commented 1 year ago

Hi @siefkenj -- just wanted to say thanks for this library, I just started using it to parse some latex documents, working great right now, I am sure I will have some questions as I get into it more.

I also wanted to introduce myself 👋 and a project I am working on, where there might be a chance to collaborate / coordinate a bit with this project if you all are game!?

I am working for the @executablebooks project on MyST Markdown, which is a markup language that is gaining traction in the python communities through tools like JupyterBook, Sphinx (as it is based on RST). We are currently working on standardizing some of the AST underlying MyST, initial work is here: https://spec.myst.tools/

This allows some pretty cool web-based rendering of things like cross-references and all sorts of other citations (e.g. see typography for inline demos). We have also started working on creating various latex templates so that you can take your markup and write it to latex using one of a few hundred journals. We have also just started pushing into JATS export (on every inline demo in the docs), which is used in scientific publishing and archiving.

All of this is one way at the moment (with the upcoming exception of JATS):

image

What would be amazing would be to have some interop with unified-latex to support reading (and probably in the future better/prettier writing) of latex documents.

I am not quite sure what next steps would be for that, I would be happy to meet and share our project's vision/goals? I am mostly here to show enthusiasm for your project. 🚀 :)

siefkenj commented 1 year ago

@rowanc1 Thank you :-). @executablebooks looks like a neat project!

I am currently involved with the PreTeXt project. It sounds like they have some overlapping with @executablebooks. One focus of the PreTeXt project is to ensure accessible output. So, for example, if you write a book in PreTeXt, you get conversion to braille for free, and your math works with screenreaders, etc. If you're interested in a possible collaboration between the projects, please stop by one of our weekly meetups, which are announced here: https://groups.google.com/g/pretext-announce

rowanc1 commented 1 year ago

Hi @siefkenj, joined the group!

Quick update from MyST, we integrated in the @unified-latex project into MyST, I did a bit of a write up on it here:

https://curvenote.com/blog/how-to-use-latex-with-myst-markdown

Hopefully enough info to spark some other ideas, we are now using this to parse articles for the journals that we are helping to support. There are some pieces that we implemented in the parsing that might be better to be included in this repo (like parsing and nesting arguments), but overall, it was really easy to work with your libraries.

Let me know if you want any other details on the implementation than the high-level overview in the blog, happy to improve docs. :)

Happy new year!

siefkenj commented 1 year ago

Happy new year! MyST is looking really great!

I am not quite sure the precise functionality you need in the parseArgument function you referenced, but does attachMacroArgs in unified-latex-util-arguments do what you need? Or are you trying to guess the signature based on the tokens that follow?

It also looks like you're having to fix some things up with, e.g., \author. Right now in unified-latex-ctan \author is defined to have a signature of m, but you seem to want to parse it as o m? I didn't know it could take an optional argument, but if you show me a reference, I can update the type signature.

If any particular existing macro definitions are causing you trouble, you can pass in overrides. For example, unified().use(unifiedLatexFromString, {macros: {author: {signature: "o m"}}}), should override the existing signature of the \author macro and not capture the token following \author as a mandatory argument.

Also, if you look here https://github.com/siefkenj/unified-latex/blob/c0d8a9bcbf65222a3245c383c1e9e314f8c4bf7c/packages/unified-latex-ctan/package/latex2e/provides.ts#L19 I added named argument parsing support so that a change in signature won't necessarily break future code (so long as the named arguments stay consistent) :-D

It also looks like maybe you want to run unified-latex-util-ligatures? It already handles turning common latex items into their unicode equivalent. Of course, it's not applied by default since it modifies the AST in a non-reversible way.