Closed ghost closed 11 years ago
Actually, I'd suggest an option to force FAT32 filename restrictions everywhere since the same restrictions apply to FAT32 filesystems no matter where you mount them. (I know NTFS has a POSIX mode where only NUL and / are forbidden, but I'm not sure whether the Linux ntfs-3g driver uses it or whether it enforces the Win32 mode for compatibility)
Either way, here are what I'd recommend filtering at bare minimum to ensure people don't get an unpleasant surprise when they try to copy a file to a thumbdrive:
\x00
(NUL), ?
, \
, *
, /
, "
, :
, |
, <
, and >
(Disallowed in FAT and Win32-mode NTFS)\x00-\x1f
and \x7f
(Even when allowing long filenames, FAT32 disallows ASCII control characters)As is, since --title
makes it impossible to tell what was and wasn't once an underscore, I'm using a wrapper for youtube-dl which trusts me to only use it for videos without slashes in their names and to run it on a POSIX-compliant filesystem. It downloads using --literal
and then uses the rename
command to escape all filenames in the current folder for safety on FAT filesystems.
I belive there are a lot of people using youtube-dl unter linux and don't bother about be able to copy the files to FAT/NTFS filesystem. On the other hand a lot of video titles uses some characters you can't use on FAT and any filtering of them makes the filename unnacessary ugly. Filtering only the forward-slash would be not that ugly.
There could be something like --filter-filenames=[FAT|UNIX] as a parameter.
@Blacker47 I wouldn't really mind that but what examples do you have in mind?
Also, it may not be as bad as you're thinking in FAT mode:
\n
in filenames and should never appear in a properly-formatted title anyway, so filtering them isn't an issue.:
and |
can be replaced with a space-padded dash to get things like "My Story - Chapter 2". (Possibly with a regular expression match like r"\s?[:|]\s?"
so you always get that pretty balance of spaces before and after.)"
can be replaced by '
. Heck, that's basically how they do it normally in Britain anyway. (Read a UK printing of something like Harry Potter or Discworld if you don't believe me)<
and >
can be comfortably replaced by [
and ]
. That's basically the kind of thinking that led to BBcode.*
in your filenames, given how it's a risk for anything that accidentally processes metacharacters and the most common use I've seen is juvenile naming of things like "**OMG******", but if you really want it, there are various unicode characters that look similar or even ASCII characters that wouldn't look bad as a replacement.That just leaves \
and /
. If you don't mind unicode, you could always use look-alike characters like U+2215 (Division Slash. ∕
) and U+2216 (Set Minus. ∖
).
Anyway, my main concern is trying to get the default behaviour to be as comfortable as possible for the largest variety of people possible.
+1 on using fancy Unicode characters instead of /\*
- great idea, @ssokolow . IMHO, --literal
should be deprecated and made an alias of --title
.
With the new sanitize_title
that resembles the proposal by @ssokolow (and that we can tweak even more based on his suggestions, although I don't like Unicode in filenames) I don't get the point of --literal
at all. And yes, it IS a security bug. Let's alias it!
Using non-ASCII with FATxx is not allways a good idea. Using similar characters from Unicode changes the title too much.
The ONLY security related problem is not to escape the only ONE char, '/' (and on windows '\'). It is easy change and can be seen by user after the change (e.g. the user needs the exact title to search for).
I don't see any clue to aliasing --literal to --title because the escaping of one char is too much work.
(I haven't checked the todays build for this issue - maybe it is fixed allready)
We consolidated the behavior now, see https://github.com/rg3/youtube-dl#output-template
--title
now replaces only security or filesystem problematic characters
--literal
opened to a number of issues for a little gained value, we aliased to --title
--restrict-filename
is the new option to leave only safe ASCII characters
Does this address the issue?
--title seems to make on linux the expected job now.
Thanks.
Using --title converts too much (spaces to underscores) for a modern filesystem.
Using --literal don't filter slashes out that produces ugly sub-directories. Maybe a directory-traversal with overwriting of important files is possible too. IMHO, using --literal should filter slashes "/" anyway on a unix-system. On windows there would be some more chars.
In case the idea don't fits in the meaning of "literal" there could be a "almostliteral" option for unix-systems.