ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.78k stars 9.98k forks source link

Custom filename transformations (was: Filename: I want to restrict characters, but permit spaces) #5042

Open keybounce opened 9 years ago

keybounce commented 9 years ago

Right now, restricting filenames to reasonable characters also prevents spaces.

Yea, I know, most unix scripts/etc. can't handle spaces, but most modern GUI tools can.

phihag commented 9 years ago

If you are using a modern GUI tool, why are you restricting characters in the first place?

keybounce commented 9 years ago

Because I want readable file names.

The code of modern GUI programs may be 8-bit smart, but I want to be able to read filenames.

And, as long as the only issue is spaces, most of what I want to do from the command line works -- bash escapes spaces. Just as long as it's either a real program, or a well-written script (and I've gotten quite good with the whole "$var" syntax all over everything ... yuck, why wasn't this in the shell design from day one?)

phihag commented 9 years ago

Can you elaborate on why filenames would get more readable when you pass in --restrict-filenames? I'd very much prefer the first filename:

$ youtube-dl --get-filename F59zpvPg3i0
新年快樂 - 小虎隊、憂歡派對 (1989)-F59zpvPg3i0.mp4
$ youtube-dl --get-filename F59zpvPg3i0 --restrict-filenames
1989-F59zpvPg3i0.mp4

As it stands, this issue is lacking context, and without context, we usually close issues. So please do provide a more detailed example of why you'd wanna restrict filenames in the first place.

keybounce commented 9 years ago

Here is a directory listing

keybounceMBP:Etho michael$ ls total 5993416 257872 Etho Plays Minecraft - Episode 390 Connected Houses.mp4 224464 Etho Plays Minecraft - Episode 391 River Terraforming.mp4 267000 Etho Plays Minecraft - Episode 392 Book Matrix.mp4 242904 Etho Plays Minecraft - Episode 394 Flying Sheep Farm.mp4 257096 Etho Plays Minecraft - Episode 395 Weird Style.mp4 239696 Etho Plays Minecraft - Episode 396 Hyper Speed Piggy.mp4 261672 Etho Plays Minecraft - Episode 397 Life Changer.mp4 502456 Etho's Modded Minecraft #13 - Bandit Camp.mp4 636848 Etho's Modded Minecraft #14 - Template vs. Blueprint.mp4 418792 Etho's Modded Minecraft #15 - Strange Voices In My Head.mp4 213968 Etho's Modded Minecraft 10 Death Mountain.mp4 252088 Etho's Modded Minecraft 11 Coke Oven Factory.mp4 183384 Etho's Modded Minecraft 12 Digital Miner.mp4 278968 Etho's Modded Minecraft 2 Tropical Fishing Huts.mp4 236544 Etho's Modded Minecraft 3 Favorite Tool.mp4 223208 Etho's Modded Minecraft 4 Smart NPCs.mp4 252792 Etho's Modded Minecraft 5 Mining Ship.mp4 276272 Etho's Modded Minecraft 6 Drilling Machine.mp4 235400 Etho's Modded Minecraft 7 Piston Power.mp4 287200 Etho's Modded Minecraft 8 Messy Closet.mp4 244792 Etho's Modded Minecraft 9 Steampunk City.mp4

Modded minecraft episodes 13-15 were downloaded with youtube-dl, as 480p (thank you for decoding the dash data and fetching it); the rest were from a firefox extension ("Download YouTube Videos as MP4") that fetches the 360p feed. The name change is significant, and breaks programs like mplayer that auto-play the next one, or even the sorting of files in Finder or the command line.

yan12125 commented 7 years ago

Other similar ideas:

  1. Keep ampersands and parentheses (@active8, #4549)
  2. Strip emojis (@sayem314, #14474)

In my opinion 2. is still a valid request even in 2017. Quite a few emojis are not in the basic multilingual plane (BMP) of Unicode. In other words, applications should support at least UCS-4 (UTF-32) to handle them correctly. Besides Android, Konsole/QTerminal don't work well, either. They use Qt's QString, which are UCS-2 internally.

Update: iTerm2 goes crazy with some emojis, too 😆

vlakoff commented 6 years ago

On some videos I'm downloading, I encounter emojis (example), or UTF-8 accents (example).

Because of these characters, I'm encountering the following issues:

So, I searched a bit and found the --restrict-filenames option. But it unnecessarily replaces spaces with underscores, which is much more visually bloated. It also removes valid ASCII characters such as &, !, etc.

I suggest adding a --ascii-filenames option, that would just produce fully ASCII filenames, without doing any other transformation.

dantheman213 commented 5 years ago

Any update on this issue?

Kochise commented 4 years ago

Files are not downloaded if there is a # in the filename. Could it be possible to have an option just to restrict illegal characters ?

frgmntdmmrs commented 4 years ago

Chipping in to also say that I'd love to see this feature to strip down emojis or add filters to output file names. Sometimes clips being downloaded have emojis in their titles and it tends to cause weird issues with other software, such as not detecting the file or crashing.

shillshocked commented 4 years ago

How is this issue solvable?

nestukh commented 4 years ago

to strip most emojis from the final filename, add this --exec switch (it's one single line):

--exec "python -B -c \"\$(printf %b 'import os,sys,re,shutil; shutil.move(sys.argv[1],re.sub(re.compile(\"([\\U0001F1E0-\\U0001F1FF,\\U0001F300-\\U0001F5FF,\\U0001F600-\\U0001F64F,\\U0001F680-\\U0001F6FF,\\U0001F700-\\U0001F77F,\\U0001F780-\\U0001F7FF,\\U0001F800-\\U0001F8FF,\\U0001F900-\\U0001F9FF,\\U0001FA00-\\U0001FA6F,\\U0001FA70-\\U0001FAFF,\\U00002702-\\U000027B0,\\U000024C2-\\U0001F251])\"), r\"\", sys.argv[1]))')\" {}"

this was tested on the penguinean operating system. Other possible emojis on https://en.wikipedia.org/wiki/Emoji#Unicode_blocks

Credits: Code derived from and original work by: https://gist.github.com/Alex-Just/e86110836f3f93fe7932290526529cd1#gistcomment-3208085

extra: --exec uses sh. Using bash in the command line for the filename in $VARIABLE (again it's one single line):

set +H; python -B -c "$(printf %b 'import os,sys,re,shutil; shutil.move(sys.argv[1],re.sub(re.compile("([\\U0001F1E0-\\U0001F1FF,\\U0001F300-\\U0001F5FF,\\U0001F600-\\U0001F64F,\\U0001F680-\\U0001F6FF,\\U0001F700-\\U0001F77F,\\U0001F780-\\U0001F7FF,\\U0001F800-\\U0001F8FF,\\U0001F900-\\U0001F9FF,\\U0001FA00-\\U0001FA6F,\\U0001FA70-\\U0001FAFF,\\U00002702-\\U000027B0,\\U000024C2-\\U0001F251])"), r"", sys.argv[1]))')" "$VARIABLE"; set -H

nestukh commented 4 years ago

P.S. in the case someone uses %(upload_date)s in the output template as well, e.g. --output "%(upload_date)s %(uploader)s - %(title)s [%(id)s].%(ext)s" and would like to convert from YYYYMMDD (20200403) to a more readable format like YYYY-MM-DD (2020-04-03), the correct --exec is:

--exec "python -B -c \"\$(printf %b 'import os,sys,re,shutil; shutil.move(sys.argv[1],re.sub(r\"^(\d{4})(\d{2})(\d{2})\", r\"\\\1-\\\2-\\\3\",re.sub(re.compile(\"([\\U0001F1E0-\\U0001F1FF,\\U0001F300-\\U0001F5FF,\\U0001F600-\\U0001F64F,\\U0001F680-\\U0001F6FF,\\U0001F700-\\U0001F77F,\\U0001F780-\\U0001F7FF,\\U0001F800-\\U0001F8FF,\\U0001F900-\\U0001F9FF,\\U0001FA00-\\U0001FA6F,\\U0001FA70-\\U0001FAFF,\\U00002702-\\U000027B0,\\U000024C2-\\U0001F251])\"), r\"\", sys.argv[1])))')\" {}"

the ^ is for making sure that the search pattern starts at the beginning of the filename. Remove it if your template differs from the example above.

It's a simple substitution that can be done with tools like rename (in place of sed) too: --exec "rename 's/^(\d{4})(\d{2})(\d{2})/\$1-\$2-\$3/' {}" but you cannot execute multiple --exec with the {} wildcard for changing the same file multiple times. Also, rename is not installed by default, while the python code above can take advantage of the same local virtualenv where youtube_dl is installed in.

lepermagpie commented 4 years ago

I appreciate the comprehensive scripts nestukh. Can this be easily replicated on Windows too, as in just replacing the appropriate syntaxes from the original script?

nestukh commented 4 years ago

Probably yes, also you will need no changes under WSL (GNU/Linux subsystem for Windows 10), MSYS2, Cygwin or others minimal GNU/Linux layer implementations on Windows.