unifiedjs / ideas

Share ideas for new utilities and tools built with @unifiedjs
https://unifiedjs.com
10 stars 0 forks source link

XML AST #2

Closed revolunet closed 3 years ago

revolunet commented 6 years ago

Hi,

Playing with some awesome mdast/hast unist stuff recently, when i went through XML again i wondered if we could use unist tools to play with XML trees, but looks like its not yet possible ?

To implement, the steps are :

a) create the syntax-tree

i guess this is defining the tree "model" and it will be very much like this one ? https://github.com/syntax-tree/hast#ast

b) create the parser/stringifier

I guess this is something like https://github.com/rehypejs/rehype/tree/master/packages/rehype-parse and https://github.com/syntax-tree/hast-util-to-html ?

wooorm commented 6 years ago

a) create the syntax-tree

Yup! Mostly like HAST. But with some added nodes for processing instructions, cdata, and whatnot!

b) create the parser/stringifier

Probably built on some other XML parser. Depends on how far you’d like to go. There’s some weird stuff (like custom entities) in XML!

wooorm commented 6 years ago

@revolunet Are you into working on this?

revolunet commented 6 years ago

nope, didnt had a chance yet :/

wooorm commented 6 years ago

Oh that's okay! I think it's pretty interesting tho!

tsabrandon commented 5 years ago

Is there any update on this one? @revolunet

revolunet commented 5 years ago

nothing new on my side sorry

ChristianMurphy commented 5 years ago

https://github.com/nashwaan/xml-js looks promising as a starting point.

wooorm commented 5 years ago

Nice, but can’t see anything about positional info?

And how about the naming:

ChristianMurphy commented 5 years ago

processor: rexml?

that may cause confusion with: https://github.com/ruby/rexml

wooorm commented 5 years ago

Hmm, different ecosystem plus not many stars, I think it’s fine to reuse that name?

ChristianMurphy commented 4 years ago

https://github.com/syntax-tree/xast

wooorm commented 4 years ago

Parsing can now be done with syntax-tree/xast-util-to-xml and serialising with syntax-tree/xast-util-to-xml, so that means the building blocks for rexml (working title?) are there.

However, I’m not sure how well rexml fits in the list of remark, rehype, retext (, redot), and thus unified. I think that’s because XML is data, the others are content. A rehype plugin has knowledge of the semantics of nodes, what they mean, to do a task (find all headings, sluggify them, add the slug as an id)—but XML doesn’t really have this.

So, I’m seeing use cases for xast and xast utilities:

  1. to parse and inspect data (I was recently parsing unicode-cldr)
  2. to construct and serialize data (EPUB files have lots of manifests in XML)

…and I do see a case of going from HTML -> XML with rehype-parse, rehype-rexml, rexml-stringify or so (EPUB books use XHTML)

But I don’t really see the case where a whole unified pipeline would be useful:

unified()
  .use(rexmlParse)
  // …what plugins are useful here?
  .use(rexmlStringify)

I’m wondering, what use cases do you folks have for rexml? Should it exist?

revolunet commented 4 years ago

Thanks for the addition !

My use case was simply to parse some XML and store it as AST so i can use select or other utils to play with the tree

loganpowell commented 4 years ago

Could this be useful in translating HAST/MDAST to/from MJML?

ChristianMurphy commented 4 years ago

It could be. It's worth noting, since MJML elements are also valid as HTML web components, rehype could also be used.

It could be helpful to start a new idea thread for MJML.

loganpowell commented 4 years ago

It could be helpful to start a new idea thread for MJML.

If you're game, so am I!

ChristianMurphy commented 3 years ago

this was added with https://github.com/syntax-tree/xast, https://github.com/syntax-tree/xast-util-from-xml, and https://github.com/syntax-tree/xast-util-to-xml this idea is now implemented. :tada: