ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.32k stars 10.03k forks source link

Filenames should be filtered on --literal too #441

Closed ghost closed 11 years ago

ghost commented 12 years ago

Using --title converts too much (spaces to underscores) for a modern filesystem.

Using --literal don't filter slashes out that produces ugly sub-directories. Maybe a directory-traversal with overwriting of important files is possible too. IMHO, using --literal should filter slashes "/" anyway on a unix-system. On windows there would be some more chars.

In case the idea don't fits in the meaning of "literal" there could be a "almostliteral" option for unix-systems.

ssokolow commented 12 years ago

Actually, I'd suggest an option to force FAT32 filename restrictions everywhere since the same restrictions apply to FAT32 filesystems no matter where you mount them. (I know NTFS has a POSIX mode where only NUL and / are forbidden, but I'm not sure whether the Linux ntfs-3g driver uses it or whether it enforces the Win32 mode for compatibility)

Either way, here are what I'd recommend filtering at bare minimum to ensure people don't get an unpleasant surprise when they try to copy a file to a thumbdrive:

As is, since --title makes it impossible to tell what was and wasn't once an underscore, I'm using a wrapper for youtube-dl which trusts me to only use it for videos without slashes in their names and to run it on a POSIX-compliant filesystem. It downloads using --literal and then uses the rename command to escape all filenames in the current folder for safety on FAT filesystems.

ghost commented 12 years ago

I belive there are a lot of people using youtube-dl unter linux and don't bother about be able to copy the files to FAT/NTFS filesystem. On the other hand a lot of video titles uses some characters you can't use on FAT and any filtering of them makes the filename unnacessary ugly. Filtering only the forward-slash would be not that ugly.

There could be something like --filter-filenames=[FAT|UNIX] as a parameter.

ssokolow commented 12 years ago

@Blacker47 I wouldn't really mind that but what examples do you have in mind?

Also, it may not be as bad as you're thinking in FAT mode:

That just leaves \ and /. If you don't mind unicode, you could always use look-alike characters like U+2215 (Division Slash. ) and U+2216 (Set Minus. ).

Anyway, my main concern is trying to get the default behaviour to be as comfortable as possible for the largest variety of people possible.

phihag commented 12 years ago

+1 on using fancy Unicode characters instead of /\* - great idea, @ssokolow . IMHO, --literal should be deprecated and made an alias of --title.

FiloSottile commented 12 years ago

With the new sanitize_title that resembles the proposal by @ssokolow (and that we can tweak even more based on his suggestions, although I don't like Unicode in filenames) I don't get the point of --literal at all. And yes, it IS a security bug. Let's alias it!

ghost commented 11 years ago

Using non-ASCII with FATxx is not allways a good idea. Using similar characters from Unicode changes the title too much.

The ONLY security related problem is not to escape the only ONE char, '/' (and on windows '\'). It is easy change and can be seen by user after the change (e.g. the user needs the exact title to search for).

I don't see any clue to aliasing --literal to --title because the escaping of one char is too much work.

(I haven't checked the todays build for this issue - maybe it is fixed allready)

FiloSottile commented 11 years ago

We consolidated the behavior now, see https://github.com/rg3/youtube-dl#output-template

--title now replaces only security or filesystem problematic characters --literal opened to a number of issues for a little gained value, we aliased to --title --restrict-filename is the new option to leave only safe ASCII characters

Does this address the issue?

ghost commented 11 years ago

--title seems to make on linux the expected job now.

Thanks.