serilog / serilog-expressions

An embeddable mini-language for filtering, enriching, and formatting Serilog events, ideal for use with JSON or XML configuration.
Apache License 2.0
190 stars 17 forks source link

Output encoding #97

Closed nblumhardt closed 1 year ago

nblumhardt commented 1 year ago

For ExpressionTemplate to be useful in scenarios like HTML email, webhook URL, or URL-encoded POST body construction, a safer mechanism is needed for output encoding.

For example, imagine we rewrite Serilog.Sinks.Email to use ExpressionTemplate, a message body might look like:

<p class="error">Exception: {@x}</p>

Since the email is being fed exceptions from a running application, a malicious user might cause an error to be generated with HTML in the message:

System.InvalidOperationException: <a href="my-bad-site">Click to see more info</a><br><br><br>is not a valid username.
    at ...

Today, to defend against this an htmlencode user-defined function might be used:

<p class="exception">Exception: {htmlencode(@x)}</p>

But, we all know how easily opt-in security measures can be overlooked.

This PR proposes to introduce a new type, TemplateOutputEncoder, that users (i.e. the Serilog.Sinks.Email assembly) can implement in order to automatically escape all output that's substituted into template holes. For example:

class TemplateOutputHtmlEncoder: TemplateOutputEncoder
{
    /// <summary>
    /// Replaces <c>&</c>, <c>&lt;</c>, <c>&gt;</c>, <c>&quot;</c>, and
    /// <c>&apos;</c> with their equivalent escape sequences. This renders the result safe for
    /// insertion into HTML attributes and element bodies apart from <c>script</c> and <c>style</c>.
    /// </summary>
    /// <param name="value">The string to encode.</param>
    /// <returns>The encoded string.</returns>
    public override string Encode(string value)
    {
        return System.Text.Encodings.Web.HtmlEncoder.Default.Encode(value);
    }
}

The encoder is provided when parsing/compiling the template:

var template = new ExpressionTemplate(
    "<p class="error">Exception: {@x}</p>",
    encoder: new TemplateOutputHtmlEncoder());

Opting out of encoding

The proposal introduces a new function in templates called unsafe, which can be used to opt out of escaping:

<p{unsafe(if @l = 'Error' then ' class="error"' else '')}>Exception: {@x}</p>

Caveats

Note that basic HTML escaping as used in the example can't correctly/safely encode values that appear in style or script contexts. HTML is a familiar use case for the example, but it's not discussed in full here.

Related work

The feature is based on the fork we use in Seq's webhook plug-in, which uses it for URI encoding within webhook URLs: https://github.com/datalust/seq-app-httprequest#configuration (see the URL row in the linked table).