Pretty-HTML5 reliable algorithm as standard reference

Pending:

Need to drop or isolate non-XML text:
- drop CDATA and comments to not pollute and avoid confusion with commented tags in the regex parser.
  PS: hidden text is invalid and changes SHA1, so must be dropped. As script and style tags are also hidden text, ideal is to express mode in the spec.
- detect and isolate pre, script and style tags, they most be preserved, without any "pretty" transmation.
- lista de tags "reset-identation" tais como <html>, <body> , <article>, etc. que não deveriam gerar identação adicional, apenas pular linha para manter destaque.
convention about empty tags as <meta/>, <br/> etc. that after C14N are expanded to open-and-close pairs, <meta></meta>. Ideal is to adopt usual HTML5 and XHTML5-polyglot recommendations.
Use Javascript as minimal alternative implementation. The core method is compatible, see this example.

okfn-brasil / HTML5-onlyContent