Open keybounce opened 9 years ago
If you are using a modern GUI tool, why are you restricting characters in the first place?
Because I want readable file names.
The code of modern GUI programs may be 8-bit smart, but I want to be able to read filenames.
And, as long as the only issue is spaces, most of what I want to do from the command line works -- bash escapes spaces. Just as long as it's either a real program, or a well-written script (and I've gotten quite good with the whole "$var" syntax all over everything ... yuck, why wasn't this in the shell design from day one?)
Can you elaborate on why filenames would get more readable when you pass in --restrict-filenames
? I'd very much prefer the first filename:
$ youtube-dl --get-filename F59zpvPg3i0
新年快樂 - 小虎隊、憂歡派對 (1989)-F59zpvPg3i0.mp4
$ youtube-dl --get-filename F59zpvPg3i0 --restrict-filenames
1989-F59zpvPg3i0.mp4
As it stands, this issue is lacking context, and without context, we usually close issues. So please do provide a more detailed example of why you'd wanna restrict filenames in the first place.
Here is a directory listing
keybounceMBP:Etho michael$ ls total 5993416 257872 Etho Plays Minecraft - Episode 390 Connected Houses.mp4 224464 Etho Plays Minecraft - Episode 391 River Terraforming.mp4 267000 Etho Plays Minecraft - Episode 392 Book Matrix.mp4 242904 Etho Plays Minecraft - Episode 394 Flying Sheep Farm.mp4 257096 Etho Plays Minecraft - Episode 395 Weird Style.mp4 239696 Etho Plays Minecraft - Episode 396 Hyper Speed Piggy.mp4 261672 Etho Plays Minecraft - Episode 397 Life Changer.mp4 502456 Etho's Modded Minecraft #13 - Bandit Camp.mp4 636848 Etho's Modded Minecraft #14 - Template vs. Blueprint.mp4 418792 Etho's Modded Minecraft #15 - Strange Voices In My Head.mp4 213968 Etho's Modded Minecraft 10 Death Mountain.mp4 252088 Etho's Modded Minecraft 11 Coke Oven Factory.mp4 183384 Etho's Modded Minecraft 12 Digital Miner.mp4 278968 Etho's Modded Minecraft 2 Tropical Fishing Huts.mp4 236544 Etho's Modded Minecraft 3 Favorite Tool.mp4 223208 Etho's Modded Minecraft 4 Smart NPCs.mp4 252792 Etho's Modded Minecraft 5 Mining Ship.mp4 276272 Etho's Modded Minecraft 6 Drilling Machine.mp4 235400 Etho's Modded Minecraft 7 Piston Power.mp4 287200 Etho's Modded Minecraft 8 Messy Closet.mp4 244792 Etho's Modded Minecraft 9 Steampunk City.mp4
Modded minecraft episodes 13-15 were downloaded with youtube-dl, as 480p (thank you for decoding the dash data and fetching it); the rest were from a firefox extension ("Download YouTube Videos as MP4") that fetches the 360p feed. The name change is significant, and breaks programs like mplayer that auto-play the next one, or even the sorting of files in Finder or the command line.
Other similar ideas:
In my opinion 2. is still a valid request even in 2017. Quite a few emojis are not in the basic multilingual plane (BMP) of Unicode. In other words, applications should support at least UCS-4 (UTF-32) to handle them correctly. Besides Android, Konsole/QTerminal don't work well, either. They use Qt's QString, which are UCS-2 internally.
Update: iTerm2 goes crazy with some emojis, too 😆
On some videos I'm downloading, I encounter emojis (example), or UTF-8 accents (example).
Because of these characters, I'm encountering the following issues:
So, I searched a bit and found the --restrict-filenames
option. But it unnecessarily replaces spaces with underscores, which is much more visually bloated. It also removes valid ASCII characters such as &
, !
, etc.
I suggest adding a --ascii-filenames
option, that would just produce fully ASCII filenames, without doing any other transformation.
Any update on this issue?
Files are not downloaded if there is a # in the filename. Could it be possible to have an option just to restrict illegal characters ?
Chipping in to also say that I'd love to see this feature to strip down emojis or add filters to output file names. Sometimes clips being downloaded have emojis in their titles and it tends to cause weird issues with other software, such as not detecting the file or crashing.
How is this issue solvable?
to strip most emojis from the final filename, add this --exec switch (it's one single line):
--exec "python -B -c \"\$(printf %b 'import os,sys,re,shutil; shutil.move(sys.argv[1],re.sub(re.compile(\"([\\U0001F1E0-\\U0001F1FF,\\U0001F300-\\U0001F5FF,\\U0001F600-\\U0001F64F,\\U0001F680-\\U0001F6FF,\\U0001F700-\\U0001F77F,\\U0001F780-\\U0001F7FF,\\U0001F800-\\U0001F8FF,\\U0001F900-\\U0001F9FF,\\U0001FA00-\\U0001FA6F,\\U0001FA70-\\U0001FAFF,\\U00002702-\\U000027B0,\\U000024C2-\\U0001F251])\"), r\"\", sys.argv[1]))')\" {}"
this was tested on the penguinean operating system. Other possible emojis on https://en.wikipedia.org/wiki/Emoji#Unicode_blocks
Credits: Code derived from and original work by: https://gist.github.com/Alex-Just/e86110836f3f93fe7932290526529cd1#gistcomment-3208085
extra:
--exec uses sh
. Using bash
in the command line for the filename in $VARIABLE
(again it's one single line):
set +H; python -B -c "$(printf %b 'import os,sys,re,shutil; shutil.move(sys.argv[1],re.sub(re.compile("([\\U0001F1E0-\\U0001F1FF,\\U0001F300-\\U0001F5FF,\\U0001F600-\\U0001F64F,\\U0001F680-\\U0001F6FF,\\U0001F700-\\U0001F77F,\\U0001F780-\\U0001F7FF,\\U0001F800-\\U0001F8FF,\\U0001F900-\\U0001F9FF,\\U0001FA00-\\U0001FA6F,\\U0001FA70-\\U0001FAFF,\\U00002702-\\U000027B0,\\U000024C2-\\U0001F251])"), r"", sys.argv[1]))')" "$VARIABLE"; set -H
P.S.
in the case someone uses %(upload_date)s
in the output template as well, e.g.
--output "%(upload_date)s %(uploader)s - %(title)s [%(id)s].%(ext)s"
and would like to convert from YYYYMMDD (20200403) to a more readable format like YYYY-MM-DD (2020-04-03), the correct --exec is:
--exec "python -B -c \"\$(printf %b 'import os,sys,re,shutil; shutil.move(sys.argv[1],re.sub(r\"^(\d{4})(\d{2})(\d{2})\", r\"\\\1-\\\2-\\\3\",re.sub(re.compile(\"([\\U0001F1E0-\\U0001F1FF,\\U0001F300-\\U0001F5FF,\\U0001F600-\\U0001F64F,\\U0001F680-\\U0001F6FF,\\U0001F700-\\U0001F77F,\\U0001F780-\\U0001F7FF,\\U0001F800-\\U0001F8FF,\\U0001F900-\\U0001F9FF,\\U0001FA00-\\U0001FA6F,\\U0001FA70-\\U0001FAFF,\\U00002702-\\U000027B0,\\U000024C2-\\U0001F251])\"), r\"\", sys.argv[1])))')\" {}"
the ^
is for making sure that the search pattern starts at the beginning of the filename. Remove it if your template differs from the example above.
It's a simple substitution that can be done with tools like rename
(in place of sed
) too:
--exec "rename 's/^(\d{4})(\d{2})(\d{2})/\$1-\$2-\$3/' {}"
but you cannot execute multiple --exec with the {}
wildcard for changing the same file multiple times. Also, rename
is not installed by default, while the python code above can take advantage of the same local virtualenv
where youtube_dl
is installed in.
I appreciate the comprehensive scripts nestukh. Can this be easily replicated on Windows too, as in just replacing the appropriate syntaxes from the original script?
Probably yes, also you will need no changes under WSL (GNU/Linux subsystem for Windows 10), MSYS2, Cygwin or others minimal GNU/Linux layer implementations on Windows.
Right now, restricting filenames to reasonable characters also prevents spaces.
Yea, I know, most unix scripts/etc. can't handle spaces, but most modern GUI tools can.