iv-org / invidious

Invidious is an alternative front-end to YouTube
https://invidious.io
GNU Affero General Public License v3.0
16.02k stars 1.76k forks source link

[Bug] Download widget replaces spaces in filename with `+` #2670

Closed babababag closed 2 years ago

babababag commented 2 years ago

Steps to reproduce

  1. Go to a video with spaces in the title e.g. https://inv.riverside.rocks/watch?v=UvpMGxHaw8U
  2. Click Download
  3. Filename has pluses instead of spaces

This happens because the filename is urlencoded before being sent in the Content-Disposition header (here and here). e.g. content-disposition: attachment; filename="lorem+ipsum"; filename*=UTF-8''lorem+ipsum

Urlencoding is not compliant in the filename directive but apparently browsers ignore the RFC and decode it anyway. Urlencoding is ok in the filename* directive but only with percent symbols, so + means a literal plus not a space. https://httpwg.org/specs/rfc6266.html#disposition.parameter.filename

The fix is to replace + with a literal space. I would PR but idk crystal and it'd take 2 seconds for one of you guys to do it anyway.

SamantazFox commented 2 years ago

Wait, why it closed this in here?!?

unixfox commented 2 years ago

Wait, why it closed this in here?!?

Oops I didn't think adding "fix" in my commit from another repo would close the issue: https://github.com/unixfox/invidious-custom/commit/385d07ce3c96a388de32190469329e8a629f8899

SamantazFox commented 2 years ago

That is not supposed to happen D:

babababag commented 2 years ago

@SamantazFox @unixfox

This is still happening, at least in Firefox.

It's still happening because this content-disposition header is still urlencoding strings with + instead of %20: https://github.com/iv-org/invidious/blob/170e75499816af460d76d05ce4d440d37a01e0fb/src/invidious/routes/video_playback.cr#L161-L161

There are two directives allowed in the content-disposition header, filename and filename*. The filename* directive only supports percent-encoded characters and some common literal ascii characters, including the plus character, so + means a literal plus (this is the issue). The filename directive only officially supports a subset of these characters, but browsers accept percent-encoded characters here too. In both cases replacing all the + characters with %20 in this content-disposition header would fix the problem.


Merged pull request

The merged pull request #2684 introduces a bug where if a video title has an ampersand in it, the downloaded filename will be truncated to everything before the ampersand.

This is what the PR changed. The title is decoded here to the true string representation of the title: https://github.com/iv-org/invidious/blob/170e75499816af460d76d05ce4d440d37a01e0fb/src/invidious/routes/video_playback.cr#L243-L243

But it's still inserted unescaped into the location header. Previously this was "fine" because as the title was never decoded, inserting it as-is meant that it was already urlencoded as the location header would expect. https://github.com/iv-org/invidious/blob/170e75499816af460d76d05ce4d440d37a01e0fb/src/invidious/routes/video_playback.cr#L276-L276 It was "fine" (but not really) because if you gave it a string that wasn't already urlencoded, it would still have been inserted into the url unescaped so you would've been able to insert arbitrary parameters into the url. I didn't report this because I was pretty sure the impact was limited to your own browser, but still...

Example location header for this redirect: Location: /videoplayback?expire=1644344668&ei=_GACYobuBePshwbSyKDwBg&ip=135.148.149.204&id=o-ABCWOwFCpELJ9T9-9yBiLVxbwSYnmHNSOHwHPdEPMrQ9&itag=22&source=youtube&requiressl=yes&mh=ks&mm=31%2C29&mn=sn-tt1eln7e%2Csn-tt1e7n7l&ms=au%2Crdu&mv=m&mvi=6&pl=25&initcwndbps=335000&vprv=1&mime=video%2Fmp4&ratebypass=yes&dur=44.744&lmt=1614897234880460&mt=1644322873&fvip=6&fexp=24001373%2C24007246&c=ANDROID&txp=6216222&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgCXgi1Q7Y0gX78xqp0NFoxjvBKY5yU-cav6kKiHaiXR0CIQDzFo95FI8OHtuzm3IFn3eIDgfspyIEYJ6PAanH-r0woQ%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRgIhAPfnmTL7_aM6gfwZzFNGjD8PlXXnKt-sUW3hiwgcGQ90AiEA6WARIHUwPtj6CMARdgQKXzGa1fgsC4eAvRxKUR_Gr_k%3D&host=rr6---sn-tt1eln7e.googlevideo.com&title=Toby & Felix Show Off Their RAT Collectibles-lYPwQuGCcvQ.mp4 The title parameter is at the end. The ampersand in the video title is not escaped, so the title parameter is Toby (with a space at the end) and the rest is garbage. Someone who controls the video title can insert arbitrary parameters into this url.


Fix

The immediate fix is to replace + with %20 in content-disposition headers. There's still the issue of inserting arbitrary url parameters though...


My opinion

It's weird that the title in the download widget's json is urlencoded instead of being json-encoded. example: GET /latest_version?download_widget={"id":"lYPwQuGCcvQ","itag":"22","title":"Toby+%26+Felix+Show+Off+Their+RAT+Collectibles-lYPwQuGCcvQ.mp4"} Urlencoded strings happen to always be safe json strings, undeniable bad vibes tho. If you urlencode instead of json-encode the title, either you have to trust that the title is urlencoded (which allows inserting arbitrary url parameters, impact probably limited but why trust unnecessarily?), or you don't trust the title so you have to try to decode it and then encode it back again for the location header, which for legitimate input is a no-op.

The security issue is probably of no consequence but IMO you must still defend against it because you shouldn't be able to control url parameters like that. But to me it just feels wrong to even spend cpu cycles doing what is a no-op for legit input. If the title was json-encoded instead, there would be one decoding step json -> plaintext (to get the title) and one encoding step plaintext -> urlencoded (for the location header).

SamantazFox commented 2 years ago

@babababag thanks for so many details :D Will look into that!