ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.41k stars 9.96k forks source link

Windows Unicode (utf16) not handled in filenames ? #5094

Open Steveland opened 9 years ago

Steveland commented 9 years ago

Hello,

I wonder if you can add support for utf16 in filenames. I'm trying to pass this command line from a Windows c++ program: youtube-dl.exe -o output_美.mp4 -v https://www.youtube.com/watch?v=HEwDZ8KtRMg

and I get this output: [debug] System config: [] [debug] User config: [] [debug] Command-line args: ['-v', '-o', 'C:\Users\Alan\Desktop\youtube-dl\output?.mp4', 'https://www.youtube.com/watch?v=HEwDZ8KtRMg'] [debug] Encodings: locale cp1252, fs mbcs, out None, pref cp1252 [debug] youtube-dl version 2015.02.28 [debug] Python version 2.7.8 - Windows-7-6.1.7601-SP1 [debug] exe versions: ffmpeg N-68881-ga79ac73, ffprobe N-68881-ga79ac73, rtmpdump 2.4 [debug] Proxy map: {} [youtube] HEwDZ8KtRMg: Downloading webpage [youtube] HEwDZ8KtRMg: Extracting video information [youtube] HEwDZ8KtRMg: Downloading DASH manifest [debug] Invoking downloader on u'https://r1---sn-25g7snee.googlevideo.com/videoplayback?source=youtube&mime=video%2Fmp4&expire=1425224794&itag=18&fexp=904844%2C905657%2C907263%2C927622%2C931392%2C934954%2C9406140%2C9406861%2C943917%2C947225%2C947240%2C948124%2C951703%2C952302%2C952605%2C952612%2C952620%2C952901%2C955301%2C957201%2C959701&sparams=dur%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Cmime%2Cmm%2Cms%2Cmv%2Cpl%2Cratebypass%2Crequiressl%2Csource%2Cupn%2Cexpire&dur=126.525&mm=31&ms=au&mv=m&mt=1425203133&ipbits=0&ip=2.14.183.224&key=yt5&upn=CUV2UYbWCXs&id=o-ALMUGUmQ1J1TmoV9FOrSLSrPdyF3XR07mSjPEJ51kId&ratebypass=yes&initcwndbps=1465000&requiressl=yes&sver=3&signature=32631A825116B8DFB0F8F040ACEECB8A897BD008.758F1F62B48F4ABDE4CFA6F51A37C5FFF1990E51&pl=16' Traceback (most recent call last): File "main.py", line 19, in File "youtube_dlinit.pyo", line 397, in main File "youtube_dlinit.pyo", line 387, in _real_main File "youtube_dl\YoutubeDL.pyo", line 1442, in download File "youtube_dl\YoutubeDL.pyo", line 654, in extract_info File "youtube_dl\YoutubeDL.pyo", line 700, in process_ie_result File "youtube_dl\YoutubeDL.pyo", line 1143, in process_video_result File "youtube_dl\YoutubeDL.pyo", line 1375, in process_info File "youtube_dl\YoutubeDL.pyo", line 1350, in dl File "youtube_dl\downloader\common.pyo", line 339, in download File "youtube_dl\downloader\http.pyo", line 158, in real_download File "youtube_dl\utils.pyo", line 258, in sanitize_open File "ntpath.pyo", line 64, in join File "ntpath.pyo", line 114, in splitdrive TypeError: object of type 'generator' has no len()

Steveland commented 9 years ago

It seems the Windows executable youtube-dl.exe is using an old version of Python (2.7.8) that does not support unicode. Could you please make the update to the latest Python version (3.4.3)? if it is not possible please explain why.

This is an important feature because it can lead to serious bugs. For example, the default output folder for downloads in Windows is "C:\Users\Username\Downloads", now if the username contains utf16 characters (chinese, russian, etc), the program will fail.

phihag commented 9 years ago

No offense, but this sounds a lot like an unfounded conspiracy theory - bear in mind that extraordinary claims do require extraordinary proof.

First of all, the error message is unrelated to UTF-16 in the first place!

Secondly, Python 2.x supports Unicode just fine, at least as far as I am aware.

The executable is generated using py2exe, which does not support Python 3. I'd gladly see a way to build 3.x. Since I only use Windows to fix reported bugs in our Windows port, I see other issues as much more pressing. You are very welcome though to suggest code to build an exe for Python 3.x on Windows.

Steveland commented 9 years ago

Hi Philipp,

Thanks for taking the time to reply.

I said that the problem is related to unicode for several reasons:

I was not aware of this py2exe tool, I guess it adds another variable to the problem.

If anyone manage to get unicode (UTF-16 or UTF-8) output filenames using youtube-dl.exe on Windows please give me the answer. I'm trying for several days to make this work but no luck so far...

Also I downloaded the latest python version, and tried to run the script YoutubeDL.py but still no luck with it. Is there a tuto on how to run youtube-dl on Windows with python installed?

I'm sorry, I'm just a Windows C++ developer and I'm not familiar with linux and command line stuff.

jaimeMF commented 9 years ago

We could look into cx_Freeze, I used it once with a python3 program and it worked fine, but I don't know if it can generate a single exe file instead of an installer.

About running youtube-dl with python on Windows: if you have installed latest python version and you haven't unselected pip during the the installation (I think it's selected by default), you can run pip install -U youtube_dl or pip install -e . from the source code directory to use the version from the repo.

Steveland commented 9 years ago

Hello Jaime,

Thanks a lot for your help. Your method (pip install -U youtube_dl) is working fine. I could get youtube-dl to work on Windows with the latest version of python and unicode chinese characters are now displayed properly in output filenames.

Philipp mentioned that python had support for unicode so I guess the problem comes from py2exe that is screwing the encoding.

Anyway, I lost too much time on this, I'll find a workaround and move on.

Thanks again.

jaimeMF commented 9 years ago

@phihag It seems that py2exe (0.9.2.2) supports python 3.4:

$ pip show py2exe
---
Name: py2exe
Version: 0.9.2.2
Location: c:\python34\lib\site-packages
Requires: 

C:\Users\jaime\Desktop>youtube-dl.exe "http://www.youtube.com/watch?v=OIYeCPUIL1E" -v -x --audio-format mp3 > log.txt 2>&1 produces:

[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['http://www.youtube.com/watch?v=OIYeCPUIL1E', '-v', '-x', '--audio-format', 'mp3']
[debug] Encodings: locale cp1252, fs mbcs, out cp1252, pref cp1252
[debug] youtube-dl version 2015.03.09
[debug] Python version 3.4.2 - Windows-7-6.1.7600
[debug] exe versions: ffmpeg 1.2, ffprobe 1.2
[debug] Proxy map: {}
[youtube] OIYeCPUIL1E: Downloading webpage
[youtube] OIYeCPUIL1E: Extracting video information
[youtube] OIYeCPUIL1E: Downloading DASH manifest
[debug] Invoking downloader on 'https://r6---sn-h5q7dnez.googlevideo.com/videoplayback?<...>'
[download] Resuming download at byte 195767
[download] Destination:  ' -   ()-OIYeCPUIL1E.m4a

[download]   3.7% of 5.03MiB at Unknown speed ETA Unknown ETA
[download]   3.8% of 5.03MiB at 750.01KiB/s ETA 00:06        
[download]   3.9% of 5.03MiB at 700.00KiB/s ETA 00:07        
[download]   4.0% of 5.03MiB at 254.22KiB/s ETA 00:19        
[download]   4.3% of 5.03MiB at 364.68KiB/s ETA 00:13        
[download]   4.9% of 5.03MiB at 459.83KiB/s ETA 00:10        
[download]   6.2% of 5.03MiB at 477.42KiB/s ETA 00:10        
[download]   8.7% of 5.03MiB at 561.64KiB/s ETA 00:08        
[download]  13.6% of 5.03MiB at 587.32KiB/s ETA 00:07        
[download]  23.6% of 5.03MiB at 661.24KiB/s ETA 00:05        
[download]  38.3% of 5.03MiB at 721.46KiB/s ETA 00:04        
[download]  54.3% of 5.03MiB at 718.17KiB/s ETA 00:03        
[download]  68.1% of 5.03MiB at 673.52KiB/s ETA 00:02        
[download]  78.8% of 5.03MiB at 659.33KiB/s ETA 00:01        
[download]  90.1% of 5.03MiB at 644.71KiB/s ETA 00:00        
[download] 100.0% of 5.03MiB at 632.75KiB/s ETA 00:00        
[download] 100% of 5.03MiB in 00:07                          
[ffmpeg] Correcting container in " ' -   ()-OIYeCPUIL1E.m4a"
[debug] ffmpeg command line: ffmpeg -y -i ' '"'"' -   ()-OIYeCPUIL1E.m4a' -c copy -f mp4 ' '"'"' -   ()-OIYeCPUIL1E.temp.m4a'
[debug] ffmpeg command line: ffprobe -show_streams ' '"'"' -   ()-OIYeCPUIL1E.m4a'
[ffmpeg] Destination:  ' -   ()-OIYeCPUIL1E.mp3
[debug] ffmpeg command line: ffmpeg -y -i ' '"'"' -   ()-OIYeCPUIL1E.m4a' -vn -acodec libmp3lame -q:a 5 ' '"'"' -   ()-OIYeCPUIL1E.mp3'
Deleting original file  ' -   ()-OIYeCPUIL1E.m4a (pass -k to keep)
Steveland commented 9 years ago

It seems that the unicode characters are deleted in your output filename. Did you manage to get the Hebrew characters in the output filename?

jaimeMF commented 9 years ago

On the desktop it showed the hebrew characters, I don't know why they don't appear in the output.

Steveland commented 9 years ago

Great! Let's hope this py2exe version will be used for future release of youtube-dl.exe

mbnoimi commented 9 years ago

+1

I've same bug for UTF-8 names (Arabic characters)

mbnoimi commented 9 years ago

BTW, I tried to build youtube-dl using the new py2exe but unfortunately I failed :(

D:\PortableApps\YouTube-dl\youtube-dl>python setup.py py2exe
C:\Python 3.5\lib\site-packages\setuptools\dist.py:283: UserWarning: The version
 specified requires normalization, consider using '2015.4.9' instead of '2015.04
.09'.
  self.metadata.version,
running py2exe
running build_py

  9 missing Modules
  ------------------
? HTMLParser                          imported from youtube_dl.compat
? cookielib                           imported from youtube_dl.compat
? netbios                             imported from uuid
? readline                            imported from cmd, code, pdb
? urllib.urlretrieve                  imported from youtube_dl.compat
? win32api                            imported from platform
? win32con                            imported from platform
? win32wnet                           imported from uuid
? xattr                               imported from youtube_dl, youtube_dl.downl
oader.http, youtube_dl.postprocessor.xattrpp
Building '.\youtube-dl.exe'.
error: [Errno 2] No such file or directory: 'C:\\Python 3.5\\lib\\site-packages\
\py2exe\\run-py3.5-win-amd64.exe'

D:\PortableApps\YouTube-dl\youtube-dl>pip install HTMLParser
Collecting HTMLParser
  Downloading HTMLParser-0.0.2.tar.gz
Installing collected packages: HTMLParser
  Running setup.py install for HTMLParser
Successfully installed HTMLParser-0.0.2

D:\PortableApps\YouTube-dl\youtube-dl>pip install cookielib netbios readline url
lib.urlretrieve win32api win32con win32wnet xattr
Collecting cookielib
  Could not find a version that satisfies the requirement cookielib (from versio
ns: )
  No matching distribution found for cookielib

D:\PortableApps\YouTube-dl\youtube-dl>
yan12125 commented 9 years ago

@mbnoimi The problem you encounter is not directly related to this issue. Feel free to open a new issue. By the way, your problem seems to be related to py2exe itself rather than youtube-dl. If I'm correct, there's no official Python 3.5 support in the latest py2exe release.

mbnoimi commented 9 years ago

@yan12125 No my problem is exactly what occurs here... read more please

your problem seems to be related to py2exe itself rather than youtube-dl

No. I just made a test for building binaries for youtube-dl using py2exe same as @jaimeMF did

mbnoimi commented 9 years ago

After many tests I could successfully built youtube-dl binary using py2exe 0.9.2.2 and Python 3.5.0a3. Now youtube-dl.exe can handle UTF-8 without any problem.

It's up to you guys to create a new distro supports UNICODE.

Thanks you all.

mbnoimi commented 9 years ago

BTW, recent Windows binary (2015.04.09) doesn't support UTF-8.

karatchov commented 9 years ago

Just my 2 cent to workaround the unicode problem on Windows:

The Nuitka built binary seems to be slightly slower to start, but does work correctly. For the lazies, I uploaded my build ... https://drive.google.com/file/d/0B1T_XhgV8nOjRzN1RHU3ekpfVFE/view?usp=sharing

videonerd commented 8 years ago

Thanks karatchov, this is the first public build that fixes the unicode issue on Windows. Would really appreciate it if the devs can incorporate a working build for the windows binary release. Thank you!

karatchov commented 8 years ago

A new build based on today's master, using Nuitka 0.5.16 & Python 3.5 x86 & MSVC 2010 express: https://drive.google.com/file/d/0B1T_XhgV8nOjTlBNX3dRUk1Zam8/view?usp=sharing

videonerd commented 8 years ago

Thank you karatchov. Whilst your efforts are much appreciated, I would urge the devs please to fix the official win32 binary release for the benefit of the wider userbase of the win32 version.

Thank you!

GoTop commented 8 years ago

@karatchov

Do you have the newest version of youtube-dl build with python3?

yan12125 commented 8 years ago

@GoTop FYI: Currently the official .exe is built against Python 3.4

GoTop commented 8 years ago

@yan12125

That's really cool!

I use the official .exe, and it solve the unicode filename problem.

Thanks!

yan12125 commented 8 years ago

@jaimeMF's result may be a Python bug/missing feature. See PEP 528.