Open linglung opened 7 years ago
Well the second time people are looking forward to unescaped strings (#10927). It might worth an option.
Here's a quick hack:
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py
index 5d654f55f..d7374e820 100755
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -1535,7 +1535,7 @@ class YoutubeDL(object):
if self.params.get('forceformat', False):
self.to_stdout(info_dict['format'])
if self.params.get('forcejson', False):
- self.to_stdout(json.dumps(info_dict))
+ self.to_stdout(json.dumps(info_dict, ensure_ascii=False))
# Do nothing else if in simulate mode
if self.params.get('simulate', False):
Using git shell, got like this:
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py
diff: unknown option -- git
diff: Try 'diff --help' for more information.
I try to configure it manually. Edit YoutubeDL.py
file from zip master, add your approach self.to_stdout(json.dumps(info_dict, ensure_ascii=False))
in line 1540. Then Execute it as developer mode to test it : python -m youtube_dl --write-info-json https://www.youtube.com/watch?v=of0B-ZvxYI4
.
Same result. 😢
Well, --write-info-json uses a different function.
diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py
index 12863e74a..6ded34832 100644
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -231,7 +231,7 @@ def write_json_file(obj, fn):
try:
with tf:
- json.dump(obj, tf)
+ json.dump(obj, tf, ensure_ascii=False)
if sys.platform == 'win32':
# Need to remove existing file on Windows, else os.rename raises
# WindowsError or FileExistsError.
On Linux/Mac/... you can use patch
to apply the change. On Windows, I'm afraid you'll need to change those files by hands.
Great..!. It works as expected.
"title": "【激震】松本伊代(51)が逮捕の可能性…(画像あり)", "alt_title": null, "thumbnail": "https://i.ytimg.com/vi/of0B-ZvxYI4/hqdefault.jpg",
"description": "これはいかんやろ\n\n【おすすめサイト】\nびっくり映像まとめ\nhttp://lifestylemovie305.club/\n癒し系感動画像まとめ\nhttp://lifestyle305.link/\n\n引用元\nまとめもりー\n\n関連動画\n【警察がガラスを割って逃走車を逮捕の大暴れの瞬間\nhttps://youtu.be/FRc_PDxdaKk\n\n【親友】草なぎ剛の逮捕後あいつだけが連絡をくれたんだ【芸能ゴシップch】\nhttps://youtu.be/F7u-eeVqvNo\n\n【逮捕】ヤマト運輸チェーンソー襲撃事件\nhttps://youtu.be/Kr4k1RXmBXk", "categories": ["Entertainment"], "tags": ["松本伊代", "逮捕", "鉄ヲタ", "侵入", "芸能ゴシップチャンネル"], "subtitles": {}, "automatic_captions": {}, "duration": 44, "age_limit": 0, "annotations": null,
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--write-info-json', 'https://www.youtube.com/watch?v=of0B-ZvxYI4', '-v']
[debug] Encodings: locale cp1252, fs utf-8, out cp1252, pref cp1252
[debug] youtube-dl version 2017.01.10
[debug] Git HEAD: 250a6a6
[debug] Python version 3.6.0 - Windows-10-10.0.14393-SP0
[debug] exe versions: ffmpeg 2.8.4, ffprobe N-82966-g6993bb4
[debug] Proxy map: {}
[youtube] of0B-ZvxYI4: Downloading webpage
[youtube] of0B-ZvxYI4: Downloading video info webpage
[youtube] of0B-ZvxYI4: Extracting video information
[youtube] of0B-ZvxYI4: Downloading MPD manifest
[info] Writing video description metadata as JSON to: 51▒-of0B-ZvxYI4.info.json
WARNING: Requested formats are incompatible for merge and will be merged into mkv.
[debug] Invoking downloader on 'https://r1---sn-npoeene7.googlevideo.com/videoplayback/id/a1fd01f99bf1608e/itag/137/source/youtube/requiressl/yes/pl/20/ms/au/mv/m/mm/31/mn/sn-npoeene7/nh/IgpwcjAyLnNpbjExKgkxMjcuMC4wLjE/initcwndbps/5181250/ratebypass/yes/mime/video%2Fmp4/otfp/1/gir/yes/clen/15804514/lmt/1484537241771041/dur/44.010/mt/1484587873/signature/51F5F5775AFC186891468FEA3189DE2C4363AEC0.73349929948C64E626C984C19B4450A69ADFBC48/key/dg_yt0/upn/TvBQw5qcbLw/ip/128.199.120.49/ipbits/0/expire/1484609801/sparams/ip,ipbits,expire,id,itag,source,requiressl,pl,ms,mv,mm,mn,nh,initcwndbps,ratebypass,mime,otfp,gir,clen,lmt,dur/'
[dashsegments] Total fragments: 10
[download] Destination: 51▒-of0B-ZvxYI4.f137.mp4
[download] 100% of 15.07MiB in 00:10
[debug] Invoking downloader on 'https://r1---sn-npoeene7.googlevideo.com/videoplayback?keepalive=yes&ei=qAR9WLyQCqWWoAOU2LGQAg&lmt=1484537811953574&sparams=clen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Ckeepalive%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cnh%2Cpl%2Crequiressl%2Csource%2Cupn%2Cexpire&gir=yes&nh=IgpwcjAyLnNpbjExKgkxMjcuMC4wLjE&signature=E0FCAFA6A26E36BBAF079871A1245E52D44F38BA.39EE2958290D2587D1F9133C72A6DD80542DCE0F&dur=44.021&initcwndbps=5181250&itag=251&clen=721390&ipbits=0&key=yt6&upn=XK0SksNiZ_k&expire=1484609800&mv=m&mt=1484587873&ms=au&id=o-AAGPZjeL-9r4CcQxIfhSH50qx54cLzbhhisXP7f74bbJ&mn=sn-npoeene7&pl=20&source=youtube&mm=31&ip=128.199.120.49&mime=audio%2Fwebm&requiressl=yes&ratebypass=yes'
[download] Destination: 51▒-of0B-ZvxYI4.f251.webm
[download] 100% of 704.48KiB in 00:01
[ffmpeg] Merging formats into "51▒-of0B-ZvxYI4.mkv"
[debug] ffmpeg command line: ffmpeg -y -i 'file:51▒-of0B-ZvxYI4.f137.mp4' -i 'file:51▒-of0B-ZvxYI4.f251.webm' -c copy -map 0:v:0 -map 1:a:0 'file:51▒-of0B-ZvxYI4.temp.mkv'
Deleting original file 51▒-of0B-ZvxYI4.f137.mp4 (pass -k to keep)
Deleting original file 51▒-of0B-ZvxYI4.f251.webm (pass -k to keep)
Sadly if i used your first approach with dump json -j
or -J
(no write json file), it didn't work.
FYI, first i restore the original utils.py
file before doing this, and changed the lines of YouTubeDL.py
file as your 1st approach.
and the logs:
python -m youtube_dl -j https://www.youtube.com/watch?v=of0B-ZvxYI4 -v
Traceback (most recent call last):
File "C:\Users\Google\AppData\Local\Programs\Python\Python36-32\lib\runpy.py", line 183, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "C:\Users\Google\AppData\Local\Programs\Python\Python36-32\lib\runpy.py", line 142, in _get_module_details
return _get_module_details(pkg_main_name, error)
File "C:\Users\Google\AppData\Local\Programs\Python\Python36-32\lib\runpy.py", line 109, in _get_module_details
__import__(pkg_name)
File "C:\Users\Google\Documents\GitHub\ytdl\youtube_dl\__init__.py", line 45, in <module>
from .YoutubeDL import YoutubeDL
File "C:\Users\Google\Documents\GitHub\ytdl\youtube_dl\YoutubeDL.py", line 1540
self.to_stdout(json.dumps(info_dict, ensure_ascii=False))
^
TabError: inconsistent use of tabs and spaces in indentation
Most likely there are tabs - replace them all with spaces.
@yan12125 Perfect. Fix now. Thank you so much 😄
python -m youtube_dl -j --encoding utf-8 https://www.youtube.com/watch?v=of0B-ZvxYI4 -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-j', '--encoding', 'utf-8', 'https://www.youtube.com/watch?v=of0B-ZvxYI4', '-v']
[debug] Encodings: locale cp1252, fs utf-8, out cp1252, pref utf-8
[debug] youtube-dl version 2017.01.10
[debug] Git HEAD: 250a6a6
[debug] Python version 3.6.0 - Windows-10-10.0.14393-SP0
[debug] exe versions: ffmpeg 2.8.4, ffprobe N-82966-g6993bb4
[debug] Proxy map: {}
{"id": "of0B-ZvxYI4", "uploader": "芸能 ゴシップ チャンネル", "uploader_id": "UC0OUfSvMHCpn2sukdhH-5kw", "uploader_url": "http://www.youtube.com/channel/UC0OUfSvMHCpn2sukdhH-5kw", "upload_date": "20170115", "license": "Standard YouTube License", "creator": null, "title": "【激震】松本伊代(51)が逮捕の可能性…(画像あり)", "alt_title": null, "thumbnail": "https://i.ytimg.com/vi/of0B-ZvxYI4/hqdefault.jpg", "description": "これはいかんやろ\n\n【おすすめサイト】\nびっくり映像まとめ\nhttp://lifestylemovie305.club/\n癒し系感動画像まとめ\nhttp://lifestyle305.link/\n\n引用元\nまとめもりー\n\n関連動画\n【警察がガラスを割って逃走車を逮捕の大暴れの瞬間\nhttps://youtu.be/FRc_PDxdaKk\n\n【親友】草なぎ剛の逮捕後あいつだけが連絡をくれたんだ【芸能ゴシップch】\nhttps://youtu.be/F7u-eeVqvNo\n\n【逮捕】ヤマト運輸チェーンソー襲撃事件\nhttps://youtu.be/Kr4k1RXmBXk", "categories": ["Entertainment"], "tags": ["松本伊代", "逮捕", "鉄ヲタ", "侵入", "芸能ゴシップチャンネル"], "subtitles": {}, "automatic_captions": {}, "duration": 44, "age_limit": 0, "annotations": null, "webpage_url": "https://www.youtube.com/watch?v=of0B-ZvxYI4", "view_count": 285206, "like_count": 62, "dislike_count": 523, "average_rating": 1.42393159866, "formats":
self.to_stdout(json.dumps(info_dict, ensure_ascii=False))
makes -j works, butjson.dump(obj, tf, ensure_ascii=False)
doesn't make a difference for --write-info-json
youtube-dl --encoding utf-8 --write-info-json https://www.youtube.com/watch?v=VA0rAN0GRY4
why this didn't applied as the default setting in every YouTube-dl released version?
actually @yan12125 you can apply the patch on windows if you use git for windows (git bash). Well at least I can. Also to do it on Windows I am affraid you have to write the diffs to file [filename].patch
and then you can use git patch
[filename].patch``
To @linglung: It may sound silly, but not all environments supports raw (not-encoded) UTF-8. youtube-dl aims to keep compatibility with most systems, so it can't be the default.
hmm you could in this case use sys.platform
and use the values from that to determine which ones @yan12125 that is how I determine to use system opus / ffmpeg on linux but not on windows in 1 of my projects.
Linux does not indicate full UTF-8 support. If one uses LC_ALL=C or LC_ALL=POSIX, UTF-8 strings can break the console. Such a setting is common in containers like Docker. (http://bugs.python.org/issue28180) On the other hand, since Python 3.6 UTF-8 support seems quite fine on Windows. (PEP528, PEP529) The logic for determining UTF-8 can be rather complicated.
which is why you could have it like this on both diffs.
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py
index 5d654f55f..d7374e820 100755
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -1535,7 +1535,7 @@ class YoutubeDL(object):
if self.params.get('forceformat', False):
self.to_stdout(info_dict['format'])
if self.params.get('forcejson', False):
- self.to_stdout(json.dumps(info_dict))
+ if sys.platform == 'win32':
+ self.to_stdout(json.dumps(info_dict, ensure_ascii=False))
+ else:
+ self.to_stdout(json.dumps(info_dict))
# Do nothing else if in simulate mode
if self.params.get('simulate', False):
diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py
index 12863e74a..6ded34832 100644
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -231,7 +231,7 @@ def write_json_file(obj, fn):
try:
with tf:
- json.dump(obj, tf)
+ if sys.platform == 'win32':
+ json.dump(obj, tf, ensure_ascii=False)
+ else:
+ json.dump(obj, tf)
if sys.platform == 'win32':
# Need to remove existing file on Windows, else os.rename raises
# WindowsError or FileExistsError.
[ ] Other
I need JSON data containing unicode (utf-8) from Youtube-dl, sadly it couldn't retrieve JSON data from YouTube video in UTF-8 (?).
Trying to print JSON info with
-j, --dump-json
or-J, --dump-single-json
,--print-json
and or wrote directly into JSON file with--write-info-json
. All results were printed in non unicode data string like originally of video source.The paramaters which were used with/out
--encoding utf-8
youtube-dl --write-info-json --encoding utf-8 -f mp4 -o "%(title)s.%(ext)s" https://www.youtube.com/watch?v=0alnhFO1B7Y -v
youtube-dl -j --encoding utf-8 -f mp4 -o "%(title)s.%(ext)s" https://www.youtube.com/watch?v=0alnhFO1B7Y -v
youtube-dl -J --encoding utf-8 -f mp4 -o "%(title)s.%(ext)s" https://www.youtube.com/watch?v=0alnhFO1B7Y -v
youtube-dl --print-json --encoding utf-8 -f mp4 -o "%(title)s.%(ext)s" https://www.youtube.com/watch?v=0alnhFO1B7Y -v
The log output:
Below is log of JSON data (this is only a part of full logs - but it represent the essential of this issue) as JSON data contains a huge string data. For example: Title, tags and descriptions :