statiqdev / Statiq.Docs

A static documentation site generator.
https://statiq.dev/docs
Other
53 stars 8 forks source link

Shortcodes/XML processing instructions escaped when code is on the page in docs recipe #19

Open daveaglick opened 5 years ago

daveaglick commented 5 years ago

Essentially makes shortcodes impossible to use on a page with either code fences or inline backticks

daveaglick commented 5 years ago

Turns out this wasn't Markdig at all. Looks like the problem is isolated to the docs recipe and is caused by AutoLink (so possibly AngleSharp related).

daveaglick commented 5 years ago

Using AngleSharp, this code:

HtmlParser parser = new HtmlParser();
using (Stream stream = new MemoryStream(Encoding.UTF8.GetBytes(@"<html><head></head><body><?# foo /?></body></html>")))
{
    IHtmlDocument htmlDocument = parser.Parse(stream);
    using (StringWriter writer = new StringWriter())
    {
        htmlDocument.ToHtml(writer, HtmlMarkupFormatter.Instance);
        writer.Flush();
        writer.ToString().Dump();
    }
}

produces this output:

<html><head></head><body><!--?# foo /?--></body></html>
daveaglick commented 5 years ago

Looks like this behavior is related to https://github.com/AngleSharp/AngleSharp/issues/609

Specifically, @FlorianRappl comment which relates to IE conditional comments, but probably also applies to XML processing instructions in the HTML:

Conditional comments are IE only constructs and not specified. As such fully HTML5 compliant parsers will parse them like that.

Which makes sense standards-wise but doesn't help get the shortcodes through template processing. Going to need to figure out a way to preserve them when doing HTML manipulation with AngleSharp.

FlorianRappl commented 5 years ago

This is a general problem. You have HTML5-invalid markup and want it to be HTML5 parsed (hence the HTML5 error correction steps in and takes over). I think there are at least 2 ways out:

Not sure if the latter is possible (seems like these are some fixed constructs).

Maybe we could also hack in (optionally available) processing instructions into AngleSharp. They would be disabled by default.

Happy to receive PRs on the topic!

daveaglick commented 5 years ago

Thanks for the quick response @FlorianRappl! The behavior makes sense now that I understand what's going on. Even though they're valid SGML and XML, processing instructions aren't indicated in the HTML5 spec. The syntax is actually arbitrary - I chose one that looks like processing instructions because it needs to "fall through" various template engines and it's enough of a gray area standards-wise that specs like CommonMark even have specific rules about them.

I also agree with your mitigation suggestions.

I'll create a new issue in AngleSharp as a feature request to document that I'm working on it.