mity / md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
MIT License
756 stars 138 forks source link

Allow parsing tag with namespace? #112

Open ioquatix opened 4 years ago

ioquatix commented 4 years ago

It looks like this can't parse a tag with a namespace, e.g. <foo:bar ...>

https://github.com/mity/md4c/blob/25096c7c987dd3bc853673360b8ef6e709ac5404/src/md4c.c#L999-L1004

Can we allow it to parse tags with namespaces? i.e. allow : in the tag name.

mity commented 4 years ago

Indeed, it cannot.

  1. It's exactly as CommonMark specification requests, I believe. As such, all compliant implementations won't recognize it.

  2. Generally, the raw HTML support in Markdown/Commonmark is simplified.

  3. Also, I'm not expert in this, but it's more XML rather than HTML thing, isn't it? But I admit, allowing XML tags might have a similar merit as HTML tags.

Given all that together, it would be imho best if you can open an issue in the CommonMark spec repo.

ioquatix commented 4 years ago

Yes, I've made a proposal here: https://github.com/commonmark/commonmark-spec/pull/648

Regarding 3, we already support namespaced attributes, so one might consider it an oversight that namespaced tags are not also supported.

craigbarnes commented 4 years ago

... it's more XML rather than HTML thing, isn't it?

Yes. There's no such thing as a "tag with a namespace" in HTML. The "tag name" tokenizer state consumes everything except [\0\t\n\f />] as part of the basic tag name.

ioquatix commented 4 years ago

@craigbarnes does that mean a tag name state can parse <foo:bar/> with a tag name of foo:bar?

craigbarnes commented 4 years ago

@craigbarnes does that mean a tag name state can parse <foo:bar/> with a tag name of foo:bar?

Yes, but foo is not a "namespace" (in the XML sense) when parsed as HTML. Namespace prefixes have specific semantics in XML/XHTML, whereas in HTML it just becomes part of the element name.

All elements in a HTML DOM tree have a namespace, but it's determined by what nodes are higher up in the tree (e.g. <svg>) rather than explicitly in the markup.