statiqdev / Statiq.Framework

A flexible and extensible static content generation framework for .NET.
https://statiq.dev/framework
MIT License
426 stars 74 forks source link

Email Addresses in mailto links are incorrectly escaped #254

Closed girlpunk closed 1 year ago

girlpunk commented 1 year ago

When linking to an email address in markdown, link text is displayed correctly but the hyperlink has extra escaping on the @ character, which breaks the email address.

For example: [test@examples.com](mailto:test@example.com) becomes <a href="mailto:test&amp;#64;example.com">test@example.com</a>

I believe this is being caused by double-escaping somewhere, as &#64; is the correct entity for the @ character.

daveaglick commented 1 year ago

There's a tricky dance that happens with escaping the @ symbol and Razor parsing. Even though escaping @ isn't necessary for plain old Markdown, it's so common to run Statiq documents through the Razor processor, that escaping them after Markdown rendering is the default behavior.

If you're creating your own pipelines and want to turn off automatic @ escaping entirely (which is only advisable when not using Razor downstream of the Markdown module), you can call EscapeAt(false) on the RenderMarkdown module:

// ...
new RenderMarkdown().EscapeAt(false);
// ...

Otherwise there's really no way to tell when an @ symbol in post-Markdown content is intended to be a Razor instruction or was from the Markdown output, so they all get escaped. If you want to disable this behavior for a specific @ symbol as with a mailto: link, you can escape it with \@:

[test\@examples.com](mailto:test\@example.com)

That produces:

<a href="mailto:test&#64;example.com">test@examples.com</a>

Which obviously isn't quite right because the @ symbol in the a element is still escaped. That's because of the way the Markdown renderer sequences the generated link element output vs. the content rendering, and where the escaping happens.

To get exactly what you want, you can put the mail link HTML right in the Markdown document, with the \@ escaping:

<a href="mailto:test\@example.com">test\@example.com</a>

Which produces the desired unescaped output:

<a href="mailto:test@example.com">test@example.com</a>

Note that this all appears to be academic. The client should correctly interpret the HTML entities and convert them back to the correct characters:

image

Is that not what you're seeing? I.e. in the rendered output does the mailto: link with #64; entities not work when clicked or display to the user incorrectly?

girlpunk commented 1 year ago

Is that not what you're seeing? I.e. in the rendered output does the mailto: link with https://github.com/orgs/statiqdev/discussions/64; entities not work when clicked or display to the user incorrectly?

Unfortunately not, the unescaped value (i.e. test@example.com) shows in the link preview, and in the To box of the new email opened upon clicking the link.

To get exactly what you want, you can put the mail link HTML right in the Markdown document, with the \@ escaping

I don't believe this would be allowed by our security policy at the moment, however I'll speak to the relevant department and see if we can get an exemption for HTML inside markdown.

daveaglick commented 1 year ago

I don't believe this would be allowed by our security policy at the moment, however I'll speak to the relevant department and see if we can get an exemption for HTML inside markdown.

If not, it's still entirely possible and good to turn off the @ escaping for the RenderMarkdown module as an option. Then it should render just like any standard Markdown renderer would. Just be aware that if you do that, any raw @ that fall through to the Razor renderer may behave strangely and cause Razor compilation problems (which obviously also isn't an issue if you're not using Razor at all either).

girlpunk commented 1 year ago

That may be the best solution for the moment, especially given the different outcomes found in the other thread. Is that something that can be turned off per-page in the front matter, or will I have to create a custom pipeline? (At the moment we're just using the examples without anything custom in c#, Bootstrapper.Factory.CreateDocs(args).RunAsync();)

daveaglick commented 1 year ago

It's module-wide for all documents right now, though a per-document escape flag would be a nifty improvement. If you're using out of the box Statiq Docs, then turning it off looks like:

Bootstrapper.Factory
    .CreateDocs(args)
    .ModifyTemplate(
        MediaTypes.Markdown,
        module => ((RenderMarkdown)module).EscapeAt(false))
    // ...

This takes advantage of the templates feature of Statiq Web and Statiq Docs (which is built on Statiq Web) that makes modifying what happens for given content types much easier.

daveaglick commented 1 year ago

FYI, I'm adding a fix for this so that RenderMarkdown watches for mailto links and does not escape an @ symbol inside one, regardless of the EscapeAt() setting. While it's a little hacky, it seems like we're much, much more likely to encounter an undesired escape in mail links than want to escape.

I'm also planning on introducing per-file @ escape settings that will override the EscapeAt() setting on the module. That way you can turn off escaping entirely on a file-by-file basis.

daveaglick commented 1 year ago

Both changes are now committed and will go out with the next release.