Open ppKrauss opened 6 years ago
Pending:
Need to drop or isolate non-XML text:
drop CDATA and comments to not pollute and avoid confusion with commented tags in the regex parser.
PS: hidden text is invalid and changes SHA1, so must be dropped. As script
and style
tags are also hidden text, ideal is to express mode in the spec.
detect and isolate pre
, script
and style
tags, they most be preserved, without any "pretty" transmation.
lista de tags "reset-identation" tais como <html>
, <body>
, <article>
, etc. que não deveriam gerar identação adicional, apenas pular linha para manter destaque.
convention about empty tags as <meta/>
, <br/>
etc. that after C14N are expanded to open-and-close pairs, <meta></meta>
. Ideal is to adopt usual HTML5 and XHTML5-polyglot recommendations.
Use Javascript as minimal alternative implementation. The core method is compatible, see this example.
There are a lot of "pretty HTML" libraries, but no one is simple and based on C14N standard. The convertion must be also reliable and easy to reproce in many languages (Javascript, Java, PHP, Python, etc.). Ideal is to use regular expression transforms as kernel for specification of the "pretty transforms".