discourse plugin: sanitize markdown titles

teamdandelion commented 5 years ago

They currently are unsanitized.

Context: https://github.com/sourcecred/sourcecred/pull/1293#pullrequestreview-276220971

┆Issue is synchronized with this Asana task by Unito

BrianLitwin commented 5 years ago

What's the solution for this? From @wchargin 's comments, we want to render things appropriately and prevent injection vulnerabilities.. should we use a library? One that uses whitelisting?

wchargin commented 5 years ago

should we use a library?

It’s certainly an option. Note that our goal is slightly unusual: we’re not trying to turn untrusted Markdown into safe rendered HTML; we’re trying to turn untrusted Markdown into safe Markdown that can be embedded into other Markdown while preserving the structure of the ambient syntax tree. This isn’t obviously trivial, because the kinds of escaping that we need to perform might depend on the context into which the Markdown is to be embedded; by way of analogy, properly escaping HTML is different in an element context (<span>$FOO</span>) than in an attribute context (<span style=$FOO></span>). You’d have to consult the CommonMark spec to derive exactly what needs to be done.

Honestly, it’s really a questionable approach to start with. Safe escaping is not just hard, it’s brittle: new features in minor version downstream upgrades can mean security issues for your project. There’s a reason that all major frameworks for safe processing of untrusted data use explicit templates—client-side JSX, server-side Jinja2, SQL and prepared statements—rather than raw escaping.

We used to use templates, too, but that was changed in #1191 or thereabouts. I don’t have enough context to comment on the rationale there; you’d have to ask @decentralion for details.

One that uses whitelisting?

Yes—whether we use a library or write the code ourselves, we should almost certainly use a whitelist-based approach rather than a blacklist-based approach.

wchargin commented 5 years ago

One way to implement this safely “by construction”, rather than merely safely “because we think that we’ve covered all the edge cases”, would be to implement the templating ourselves. Instead of emitting a Markdown string like

  description() {
    return `[#${this.number()}](${this.url()}): ${this.title()}`;
  }

instead emit an object like

  description() {
    return {
      template: "[#{{number}}]({{url}}): {{title}}]",
      parameters: {
        number: String(this.number()),
        url: this.url(),
        title: this.title(),
      },
    };
  }

where the client is expected to parse the Markdown template as is, without substitutions, then traverse the Markdown syntax tree and replace any instances of substitution tokens with their corresponding parameter values. This would make the result safe against Markdown context escaping. We’d still need to escape potentially dangerous URLs and the like, so we wouldn’t quite be out of the woods.

teamdandelion commented 5 years ago

I don't think I appreciated the seriousness of this issue in the initial review; now I'm realizing that it allows general XSS through SourceCred. Right now there's not really anything worth attacking, but we shouldn't leave it in there.

All I really need from the markdown feature is the ability to url-link to the real content for a given node. The ability to have general markdown is convenient but maybe too much work to get right.

So, maybe we can take the following steps to close the XSS vuln without needing to add "explorer adapters" and such back in:

add canonical urls to graph nodes
change the frontend to hyperlink using the canoncial urls (ignoring the markdown of the nodes)
remove the markdown formatting on all the nodes, and descriptions become plain strings

@BrianLitwin is this something you'd like to take on?

BrianLitwin commented 5 years ago

@decentralion I'm going to start with #1317 to try to grok timeline cred

sourcecred / sourcecred

discourse plugin: sanitize markdown titles #1297