wooorm / markdown-rs

CommonMark compliant markdown parser in Rust with ASTs and extensions
https://docs.rs/markdown/1.0.0-alpha.18/markdown/
MIT License
836 stars 41 forks source link

Stronger types #87

Open tv42 opened 8 months ago

tv42 commented 8 months ago

Hi. I started using markdown-rs and noticed I had to add a panic into my code just because everything is a Node, but List can only really contain ListItems. Since 1.x is still in alpha, I figured this would be a good time to speak up.

Have you considered making the AST types stronger? I think List.children should be Vec<ListItem>, and similarly e.g. Strong cannot contain a Paragraph or a Heading..

mdast supports this line of thinking:

and so on.

I should be able to contribute code too, if you're interested and breaking the API is ok.

wooorm commented 8 months ago

Yes, supporting content types of mdast more exactly would be very nice! This is actually my main blocker for v1. If you want to work on that: great!

tv42 commented 8 months ago

Having spent a day looking at the source, I'm less enthusiastic. Changing the resulting data structure is simple enough, but the parser is heavily built on the idea that all nodes are interchangeable, their correct layout happens mostly by the incoming events not being arbitrary, and there's a lot of mystic stuff there that's not obvious, like the context stack, "virtual steps", links to other events (all of previous, next, content; also seemingly into the future), etc.

I'm still looking into it, but I don't think this change will happen without either

I'm exploring what the parser would have to look like, but the undocumented details of the events are getting in the way.

How confident are you in the test coverage for the parser? It seems the tests are mostly in terms of to_html, with roughly one AST-structure asserting test per node type.

wooorm commented 7 months ago

yeah, markdown is incredibly complex. And the extension mechanism is also very complex.

The parser is very well tested. There are few node tests, just the ones for coverage, because the AST is slated to change

xelatihy commented 6 months ago

I was looking into this for some projects I am working on. I suspect that changing the parser is really hard, and in fact may not be a good idea overall.

An alternative might be to have a "higher-level" AST that is strictly typed, and let the library user decide which one to use. The high-level, strictly typed, AST would be generated from the low-level one that exists now, without changing the parser.

In fact, that is what I am doing in my project. I am trying to model my AST similarly to the pandoc AST types, but adapted with a bit of richer types.

If there is interest in this idea, and if my project works, I would by happy to contribute code.

wooorm commented 6 months ago

AST already exists, and is prevalent across the JavaScript ecosystem, but needs to be better modelled here in Rust: https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/mdast/index.d.ts.

Or, what do you mean with “higher-level AST”?