Closed VyrCossont closed 1 year ago
There should be no
<img>
tags in HTML that GtS displays on the public web interface or sends to clients through the API
Some other instances seem to keep the img
tags but change them to links to the own instance. I’m seeing at least two variants of this:
/fileserver/
equivalentThe latter seems to be the most sensible, but downloading all images when the posts are coming in seems to be problematic as well (DDoS potential, and images may possibly be never needed). If it’s possible to defer that loading and storing to when it’ll first be needed, that seems to be the best option to me.
Other HTML elements that cause the browser to access external resources without user interaction should also be inspected (I assume/hope script
is already out, and link
, but I’ve not got a full overview of all tags that can do these things… object
used to be used for SVG some years ago, unsure if this is still present, etc).
This is potentially a security issue (privacy exposure).
Sorry, I was going to report this tomorrow, I need a night now.
Ah thanks, we'll fix this before next release.
Re: your other concerns, script
(among other things) would be caught by the go html templating so it wouldn't reach your browser, and SQL injection and stuff like that is shielded against by our database interface. But the above change also updates our normalization to just remove anything dodgy as soon as we get it, rather than relying on templating and clients to filter it out. And also updates our sanitizer to remove inline img tags!
OK, thanks.
I was wrong about Friendica, while I do get two requests from each instance, they are a HEAD and a GET, so they seem to download it.
Pleroma/Akkoma seem to merely proxy the request but it didn’t get enough views for me to be sure on that. Someone reports that they can cache-proxy if enabled.
Husky (Tusky fork) seems to drop(?) the image, only a small turquoise square shows up according to someone in that thread.
And I got…
Honk auto downloads attachments (including
<img>
tags) and re-hosts them (basically forever).
… along with a comment that GtS is “particularity tricky to federate with”.
This (downloading and keeping them) is something I’d love to see long-term (stripping is ok short-term, of course). Perhaps when the status is first requested, initiate the download and return the status with the /fileserver/
link where the image will eventually be, and in the fileserver, if the download is still pending, wait / keep the connection open, maybe, for a while?
Yeah, that's likely what we'll end up doing, but only if the embedded image shares a domain with the account doing the embedding, or something :thinking: We'll have a think about it.
Describe the bug with a clear and concise description of what the bug is.
GtS does not sanitize received HTML aggressively enough to remove images. The GtS public web interface, and clients that don't do their own HTML sanitization (like Semaphore) will then display those images, and are thus vulnerable to user tracking.
Originally reported by @mirabilos at https://toot.mirbsd.org/@mirabilos/statuses/01H7GWS60V4J5X59AEKKRKXN5G
What's your GoToSocial Version?
GoToSocial 0.10.0 git-89ee9d5
GoToSocial Arch
all
What happened?
The HTML sanitizer GtS uses is BlueMonday's UGC policy: https://github.com/superseriousbusiness/gotosocial/blob/main/internal/text/sanitize.go#L33
This intentionally allows images through: https://github.com/microcosm-cc/bluemonday/blob/main/policies.go#L250
What you expected to happen?
There should be no
<img>
tags in HTML that GtS displays on the public web interface or sends to clients through the API (with the exception of GtS-inserted emoji on the public web interface).How to reproduce it?
https://toot.mirbsd.org/@mirabilos/statuses/01H7GWS60V4J5X59AEKKRKXN5G is a test post that contains an embedded image
https://benjojo.co.uk/u/benjojo/h/QQ8qvtsM2Xxt2cnwJy is another test post on a Honk instance
Anything else we need to know?
Starting from BlueMonday's default-deny-everything strict policy and adding known-safe elements, attributes, and protocols seems preferable. See Mastodon's sanitization docs and SwiftSoup's
basic
safelist.