monarch-initiative / monarch-app

Monarch Initiative website and API
https://monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
18 stars 6 forks source link

HTML injection in phenogrid tooltips (and <AppNodeBadge> more generally) #902

Open ptgolden opened 3 hours ago

ptgolden commented 3 hours ago

As described in https://github.com/monarch-initiative/monarch-app/issues/887#issuecomment-2479676335, there is an HTML injection bug in the phenogrid component. The offending code is here:

https://github.com/monarch-initiative/monarch-app/blob/ce0851b2193164f8c473c4515ed19e7bafbf1e5b/frontend/src/components/ThePhenogrid.vue#L183-L203

The name property in <AppNodeBadge> (which can be user generated for columns in the multicompare mode of phenogrid) is passed as raw HTML here:

https://github.com/monarch-initiative/monarch-app/blob/ce0851b2193164f8c473c4515ed19e7bafbf1e5b/frontend/src/components/AppNodeBadge.vue#L13-L27

This breaks on labels that contain strings that can be parsed as HTML tags, such as 66.45-Dsg2<tm1a(EUCOMM)Wtsi>/Dsg2<tm1a(EUCOMM)Wtsi>, or, for script injection fun, Group Label <script type="text/javascript">console.alert('hi')</script>. As a rule, it is not good to render user-generated text directly as HTML. (The Vue documentation agrees).

This issue would also show itself if there were ever an entity in the monarch KG that had a name with some \<angled brackets> in it.

ptgolden commented 3 hours ago

The injection bug was introduced in c106f349c.

@kevinschaper, do you remember why you decided you needed to pass HTML rather than text? It looks like it's to support entity labels that contain HTML, like Lama2<sup>tm1Stk</sup>/Lama2<sup>tm1Stk</sup> [background:] involves: 129/Sv * 129P2/OlaHsd * BALB/c * ICR. Is it the case that labels are allowed to contain any arbitrary HTML?

kevinschaper commented 1 hour ago

Yeah, it’s for display of html in genotype names, but it really is more that we want to allow sup tags than any html.

kevinschaper commented 1 hour ago

I’m used to there being both html and ascii representations of genotypes, and it makes sense that they kind of require the opposite setting. Right now we’re using provided genotype names that are the html version where alleles are in sup tags (example here), and IMPC is passing the ascii equivalent where the alleles are in angle brackets.