wooorm / markdown-rs

CommonMark compliant markdown parser in Rust with ASTs and extensions
https://docs.rs/markdown/1.0.0-alpha.18/markdown/
MIT License
836 stars 41 forks source link

Serializing mdast to markdown #64

Open Enoumy opened 1 year ago

Enoumy commented 1 year ago

(Whoops accidentally hit enter before drafting a content for this question, my apologies for the noise!)

Hi! I have a perhaps newbie question! I can use markdown::to_mdast to go from &str -> Node. Is it possible/is there a function to go back to a string - Node -> &str` - in a way that roundtrips?

I came across Node::to_string, and it does seem to convert nodes into a string but it also deletes the links/titles/and most other ast nodes, which if re-parsed again, results in a different ast. Unsure if this question is reasonable/within the context of this crate, but is there an alternate function elsewhere that is round-trippable to/from &str <-> Node? I am also happy to take a stab at implementing this "rountrippable" unparser function myself, but was wondering if a function like it already existed.

For further clarification, by "roundtripping", I would be writing a property based test, like markdown::to_mdast(to_string(node)) == node be true for all node's.

Thanks!

wooorm commented 1 year ago

No, this is not yet possible, as mdast-util-to-markdown has not been implemented in Rust yet.

You can work on this. Though, it is involved work that takes a while. The good part is that everything has already been implemented in JavaScript.

Finally, “complete” roundtripping (toString(fromString(x)) == x) is impossible with ASTs. ASTs are abstract. They loose information. That is intentional. So the results will never be exact, but the results will be equivalent.

h7kanna commented 1 year ago

Will this work? Passing on the 'serde_json' serialized format to mdast-util-to-markdown?

wooorm commented 1 year ago

perhaps

a-viv-a commented 10 months ago

I wrote a likely crummy implementation of this for a personal project here, would something like this make sense as a PR or a new crate?

It passes a (much) weaker version of the proptest @Enoumy proposes, where string -> mdast -> string2 -> mdast -> string3 produces an equivalent string2 and string3 (assuming I understand how proptest works :grin: )

I don't think it covers all the possible nodes mdasts can include, and it applies some opinionated formatting. I also suspect this recursive approach is bad for performance. (I'm learning rust through this project, so I wouldn't be surprised to learn something about this code is very far from best practices)

wooorm commented 10 months ago

Nice start and welcome to rust :)

a-viv-a commented 10 months ago

I'll leave this code in my own project then. I found this issue when I was already mostly done with this implementation, so I couldn't until it was too late. I'll take a look now, but I don't plan to write something new when I have something that works for me.

Edit: if nothing else I need to copy the unsafe character support...

moy2010 commented 7 months ago

@wooorm, do you know why wouldn't leveraging the ToString implementation for this be a good idea? Or is the intention to have a separate method for this?

wooorm commented 7 months ago

“to string” is already a thing in the mdast world, getting just the text out. Formatting markdown is complex. And not always needed. Yes, separate methods. See the first comment. https://github.com/syntax-tree/mdast-util-to-markdown

moy2010 commented 7 months ago

I see. I will try to work on a PR then :slightly_smiling_face:.