Open teamdandelion opened 5 years ago
What's the solution for this? From @wchargin 's comments, we want to render things appropriately and prevent injection vulnerabilities.. should we use a library? One that uses whitelisting?
should we use a library?
It’s certainly an option. Note that our goal is slightly unusual: we’re
not trying to turn untrusted Markdown into safe rendered HTML; we’re
trying to turn untrusted Markdown into safe Markdown that can be
embedded into other Markdown while preserving the structure of the
ambient syntax tree. This isn’t obviously trivial, because the kinds of
escaping that we need to perform might depend on the context into which
the Markdown is to be embedded; by way of analogy, properly escaping
HTML is different in an element context (<span>$FOO</span>
) than in an
attribute context (<span style=$FOO></span>
). You’d have to consult
the CommonMark spec to derive exactly what needs to be done.
Honestly, it’s really a questionable approach to start with. Safe escaping is not just hard, it’s brittle: new features in minor version downstream upgrades can mean security issues for your project. There’s a reason that all major frameworks for safe processing of untrusted data use explicit templates—client-side JSX, server-side Jinja2, SQL and prepared statements—rather than raw escaping.
We used to use templates, too, but that was changed in #1191 or thereabouts. I don’t have enough context to comment on the rationale there; you’d have to ask @decentralion for details.
One that uses whitelisting?
Yes—whether we use a library or write the code ourselves, we should almost certainly use a whitelist-based approach rather than a blacklist-based approach.
One way to implement this safely “by construction”, rather than merely safely “because we think that we’ve covered all the edge cases”, would be to implement the templating ourselves. Instead of emitting a Markdown string like
description() {
return `[#${this.number()}](${this.url()}): ${this.title()}`;
}
instead emit an object like
description() {
return {
template: "[#{{number}}]({{url}}): {{title}}]",
parameters: {
number: String(this.number()),
url: this.url(),
title: this.title(),
},
};
}
where the client is expected to parse the Markdown template
as is,
without substitutions, then traverse the Markdown syntax tree and
replace any instances of substitution tokens with their corresponding
parameter values. This would make the result safe against Markdown
context escaping. We’d still need to escape potentially dangerous URLs
and the like, so we wouldn’t quite be out of the woods.
I don't think I appreciated the seriousness of this issue in the initial review; now I'm realizing that it allows general XSS through SourceCred. Right now there's not really anything worth attacking, but we shouldn't leave it in there.
All I really need from the markdown feature is the ability to url-link to the real content for a given node. The ability to have general markdown is convenient but maybe too much work to get right.
So, maybe we can take the following steps to close the XSS vuln without needing to add "explorer adapters" and such back in:
@BrianLitwin is this something you'd like to take on?
@decentralion I'm going to start with #1317 to try to grok timeline cred
They currently are unsanitized.
Context: https://github.com/sourcecred/sourcecred/pull/1293#pullrequestreview-276220971
┆Issue is synchronized with this Asana task by Unito