superseriousbusiness / gotosocial

Fast, fun, small ActivityPub server.
https://docs.gotosocial.org
GNU Affero General Public License v3.0
3.85k stars 338 forks source link

[bug] og:description mangles escaped shell commands #1549

Closed trysdyn closed 1 year ago

trysdyn commented 1 year ago

Describe the bug with a clear and concise description of what the bug is. Please include screenshots of any visual issues.

When generating the og:description OpenGraph tag for a plaintext post, GtS strips escape (\) characters, which can result in the OpenGraph embed presenting a shell command or code that is not what was originally authored in the post.

What's your GoToSocial Version?

v0.7.1 tag, source build

Browser version

n/a

What happened?

I authored a post. In this post I include the following shell command as a recommendation for people to try.

echo '\e]8;;http://example.com\e\This is a link\e]8;;\e'

Someone then linked my post in Discord, which presented the unfurl as generated by the OpenGraph tags. There, the command was mangled to this:

echo 'e]8;;http://example.comeThis is a linke]8;;e

Note the missing backslashes. This significantly changes the impact of the command :)

I've verified the slashes are stripped on the og:description tag, so it's not Discord's doing.

What you expected to happen?

The escape slashes to be left in the post text in og:description.

How to reproduce it?

Post anything with the Public visibility level that contains any number of \ characters, then inspect the post in the frontend and view source to see the \s were stripped in og:description.

Anything else we need to know?

No response

daenney commented 1 year ago

This is done here: https://github.com/superseriousbusiness/gotosocial/blob/b6fbdc66c1ce1ec61ebfb6fcc0351ea627a1d288/internal/web/opengraph.go#L134-L144

My guess is the "safe to use as a template.HTMLAddr" is why this happens. I wonder if it might be better to &amp-encode these instead?

tsmethurst commented 1 year ago

Mmm could be better yeah... Thanks for looking at this :)

daenney commented 1 year ago

Haha, actually, what if instead of doing it by hand we call html.EscapeString, the counter-part of html.UnescapeString. It seems to do what we want:

var htmlEscaper = strings.NewReplacer(
    `&`, "&",
    `'`, "'", // "'" is shorter than "'" and apos was not in HTML until HTML5.
    `<`, "&lt;",
    `>`, "&gt;",
    `"`, "&#34;", // "&#34;" is shorter than "&quot;".
)

// EscapeString escapes special characters like "<" to become "&lt;". It
// escapes only five such characters: <, >, &, ' and ".
// UnescapeString(EscapeString(s)) == s always holds, but the converse isn't
// always true.
func EscapeString(s string) string {
    return htmlEscaper.Replace(s)
}

It doesn't look like \ or \n are considered a problem for HTML, although we could encode the \ as &bsol; which would also take care of the \n case.

tsmethurst commented 1 year ago

Aha okay, so we could unescape it, sanitize it, and escape it again?