ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.37k stars 9.96k forks source link

Instagram URL + proxy breaks the code #25464

Closed Kikobeats closed 4 years ago

Kikobeats commented 4 years ago

Hello,

When I run youtube-dl providing an Instagram post with video and using a proxy, the Python interpreter crashed.

error trace ```bash youtube-dl --dump-json -f best --proxy=https://lum-customer-hl_1234-zone-twittervideo2-ip-1.2.3.4:pwd@zproxy.lum-superproxy.io:22225 --verbose https://www.instagram.com/p/B5LeHK2h4p0/ [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: [u'--dump-json', u'-f', u'best', u'--proxy=https://lum-customer-hl_1234-zone-twittervideo2-ip-1.2.3.4:pwd@zproxy.lum-superproxy.io:22225', u'--verbose', u'https://www.instagram.com/p/B5LeHK2h4p0/'] [debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2020.05.29 [debug] Python version 2.7.16 (CPython) - Darwin-19.4.0-x86_64-i386-64bit [debug] exe versions: ffmpeg 4.2.2, ffprobe 4.2.2, rtmpdump 2.4 [debug] Proxy map: {u'http': u'https://lum-customer-hl_1234-zone-twittervideo2-ip-1.2.3.4:pwd@zproxy.lum-superproxy.io:22225', u'https': u'https://lum-customer-hl_1234-zone-twittervideo2-ip-1.2.3.4:pwd@zproxy.lum-superproxy.io:22225'} ERROR: Unable to extract video url; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Traceback (most recent call last): File "/Users/josefranciscoverdugambin/Projects/microlink/metascraper/packages/metascraper-media-provider/node_modules/youtube-dl/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 797, in extract_info ie_result = ie.extract(url) File "/Users/josefranciscoverdugambin/Projects/microlink/metascraper/packages/metascraper-media-provider/node_modules/youtube-dl/bin/youtube-dl/youtube_dl/extractor/common.py", line 530, in extract ie_result = self._real_extract(url) File "/Users/josefranciscoverdugambin/Projects/microlink/metascraper/packages/metascraper-media-provider/node_modules/youtube-dl/bin/youtube-dl/youtube_dl/extractor/instagram.py", line 195, in _real_extract video_url = self._og_search_video_url(webpage, secure=False) File "/Users/josefranciscoverdugambin/Projects/microlink/metascraper/packages/metascraper-media-provider/node_modules/youtube-dl/bin/youtube-dl/youtube_dl/extractor/common.py", line 1123, in _og_search_video_url return self._html_search_regex(regexes, html, name, **kargs) File "/Users/josefranciscoverdugambin/Projects/microlink/metascraper/packages/metascraper-media-provider/node_modules/youtube-dl/bin/youtube-dl/youtube_dl/extractor/common.py", line 1014, in _html_search_regex res = self._search_regex(pattern, string, name, default, fatal, flags, group) File "/Users/josefranciscoverdugambin/Projects/microlink/metascraper/packages/metascraper-media-provider/node_modules/youtube-dl/bin/youtube-dl/youtube_dl/extractor/common.py", line 1005, in _search_regex raise RegexNotFoundError('Unable to extract %s' % _name) RegexNotFoundError: Unable to extract video url; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. ```

I verified the proxy works as expected using a Twitter URL

Twitter URL with proxy working fine ```bash youtube-dl --dump-json -f best --no-warnings --no-call-home --no-check-certificate --prefer-free-formats --youtube-skip-dash-manifest --referer=https://twitter.com/verge/status/957383241714970624 --proxy=https://lum-customer-hl_1234-zone-twittervideo2-ip-1.2.3.4:pwd@zproxy.lum-superproxy.io:22225 --verbose -- https://twitter.com/verge/status/957383241714970624 [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: [u'--dump-json', u'-f', u'best', u'--no-warnings', u'--no-call-home', u'--no-check-certificate', u'--prefer-free-formats', u'--youtube-skip-dash-manifest', u'--referer=https://twitter.com/verge/status/957383241714970624', u'--proxy=https://lum-customer-hl_1234-zone-twittervideo2-ip-1.2.3.4:pwd@zproxy.lum-superproxy.io:22225', u'--verbose', u'--', u'https://twitter.com/verge/status/957383241714970624'] [debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2020.05.29 [debug] Python version 2.7.16 (CPython) - Darwin-19.4.0-x86_64-i386-64bit [debug] exe versions: ffmpeg 4.2.2, ffprobe 4.2.2, rtmpdump 2.4 [debug] Proxy map: {u'http': u'https://lum-customer-hl_1234-zone-twittervideo2-ip-1.2.3.4:pwd@zproxy.lum-superproxy.io:22225', u'https': u'https://lum-customer-hl_1234-zone-twittervideo2-ip-1.2.3.4:pwd@zproxy.lum-superproxy.io:22225'} {"display_id": "957383241714970624", "extractor": "twitter", "tbr": 1280, "protocol": "https", "description": "Is it bad to blow into game cartridges? https://t.co/Y3yAimrUnP", "tags": [], "timestamp": 1517092926, "format": "http-1280 - 720x720", "formats": [{"http_headers": {"Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7", "Accept-Language": "en-us,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "https://twitter.com/verge/status/957383241714970624", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3678.1 Safari/537.36"}, "protocol": "m3u8_native", "format": "hls-320 - 240x240", "url": "https://video.twimg.com/amplify_video/943561675927519232/pl/240x240/hqThe2qwGxY4us_s.m3u8", "vcodec": "avc1.4d0015", "tbr": 320.0, "height": 240, "width": 240, "ext": "mp4", "preference": null, "fps": null, "manifest_url": "https://video.twimg.com/amplify_video/943561675927519232/pl/YNw1OIz1A5FFywhq.m3u8", "format_id": "hls-320", "acodec": "mp4a.40.2"}, {"http_headers": {"Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7", "Accept-Language": "en-us,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "https://twitter.com/verge/status/957383241714970624", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3678.1 Safari/537.36"}, "protocol": "https", "format": "http-320 - 240x240", "url": "https://video.twimg.com/amplify_video/943561675927519232/vid/240x240/mijiQdCq-p9FaO8H.mp4", "tbr": 320, "height": 240, "width": 240, "ext": "mp4", "format_id": "http-320"}, {"http_headers": {"Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7", "Accept-Language": "en-us,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "https://twitter.com/verge/status/957383241714970624", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3678.1 Safari/537.36"}, "protocol": "m3u8_native", "format": "hls-832 - 480x480", "url": "https://video.twimg.com/amplify_video/943561675927519232/pl/480x480/3qIAtN3BK0tvUuQX.m3u8", "vcodec": "avc1.4d001f", "tbr": 832.0, "height": 480, "width": 480, "ext": "mp4", "preference": null, "fps": null, "manifest_url": "https://video.twimg.com/amplify_video/943561675927519232/pl/YNw1OIz1A5FFywhq.m3u8", "format_id": "hls-832", "acodec": "mp4a.40.2"}, {"http_headers": {"Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7", "Accept-Language": "en-us,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "https://twitter.com/verge/status/957383241714970624", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3678.1 Safari/537.36"}, "protocol": "https", "format": "http-832 - 480x480", "url": "https://video.twimg.com/amplify_video/943561675927519232/vid/480x480/qURzB_XtWBE-dvRa.mp4", "tbr": 832, "height": 480, "width": 480, "ext": "mp4", "format_id": "http-832"}, {"http_headers": {"Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7", "Accept-Language": "en-us,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "https://twitter.com/verge/status/957383241714970624", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3678.1 Safari/537.36"}, "protocol": "m3u8_native", "format": "hls-1280 - 720x720", "url": "https://video.twimg.com/amplify_video/943561675927519232/pl/720x720/p0lEHBKAhtm_3T9E.m3u8", "vcodec": "avc1.640020", "tbr": 1280.0, "height": 720, "width": 720, "ext": "mp4", "preference": null, "fps": null, "manifest_url": "https://video.twimg.com/amplify_video/943561675927519232/pl/YNw1OIz1A5FFywhq.m3u8", "format_id": "hls-1280", "acodec": "mp4a.40.2"}, {"http_headers": {"Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7", "Accept-Language": "en-us,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "https://twitter.com/verge/status/957383241714970624", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3678.1 Safari/537.36"}, "protocol": "https", "format": "http-1280 - 720x720", "url": "https://video.twimg.com/amplify_video/943561675927519232/vid/720x720/h1uN7biCI-Fbzm9D.mp4", "tbr": 1280, "height": 720, "width": 720, "ext": "mp4", "format_id": "http-1280"}], "height": 720, "_filename": "The Verge - Is it bad to blow into game cartridges-957383241714970624.mp4", "like_count": 145, "uploader": "The Verge", "duration": 146.563, "format_id": "http-1280", "upload_date": "20180127", "id": "957383241714970624", "playlist": null, "thumbnails": [{"url": "https://pbs.twimg.com/media/DRg1OMRVwAEuwTK.jpg?name=thumb", "width": 150, "resolution": "150x150", "id": "thumb", "height": 150}, {"url": "https://pbs.twimg.com/media/DRg1OMRVwAEuwTK.jpg?name=small", "width": 680, "resolution": "680x680", "id": "small", "height": 680}, {"url": "https://pbs.twimg.com/media/DRg1OMRVwAEuwTK.jpg?name=large", "width": 1080, "resolution": "1080x1080", "id": "large", "height": 1080}, {"url": "https://pbs.twimg.com/media/DRg1OMRVwAEuwTK.jpg?name=medium", "width": 1080, "resolution": "1080x1080", "id": "medium", "height": 1080}, {"url": "https://pbs.twimg.com/media/DRg1OMRVwAEuwTK.jpg?name=orig", "width": 1080, "resolution": "1080x1080", "id": "orig", "height": 1080}], "title": "The Verge - Is it bad to blow into game cartridges?", "url": "https://video.twimg.com/amplify_video/943561675927519232/vid/720x720/h1uN7biCI-Fbzm9D.mp4", "extractor_key": "Twitter", "ext": "mp4", "http_headers": {"Accept-Language": "en-us,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3678.1 Safari/537.36", "Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7", "Referer": "https://twitter.com/verge/status/957383241714970624"}, "repost_count": 48, "uploader_id": "verge", "width": 720, "comment_count": 15, "uploader_url": "https://twitter.com/verge", "webpage_url": "https://twitter.com/verge/status/957383241714970624", "requested_subtitles": null, "fulltitle": "The Verge - Is it bad to blow into game cartridges?", "age_limit": 0, "thumbnail": "https://pbs.twimg.com/media/DRg1OMRVwAEuwTK.jpg?name=orig", "webpage_url_basename": "957383241714970624", "playlist_index": null} ```

In fact, if I just remove the proxy flag, the Instagram URL works as expected

Instagram without proxy working fine ```bash youtube-dl --dump-json -f best --verbose https://www.instagram.com/p/BmYooZbhCfJ/ [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: [u'--dump-json', u'-f', u'best', u'--verbose', u'https://www.instagram.com/p/BmYooZbhCfJ/'] [debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2020.05.29 [debug] Python version 2.7.16 (CPython) - Darwin-19.4.0-x86_64-i386-64bit [debug] exe versions: ffmpeg 4.2.2, ffprobe 4.2.2, rtmpdump 2.4 [debug] Proxy map: {} {"display_id": "BmYooZbhCfJ", "extractor": "Instagram", "protocol": "https", "description": "\u202aModel 3 Performance testing in Alaska \u2744\ufe0f\u202c", "upload_date": "20180812", "timestamp": 1534090105, "format": "0 - 640x352", "formats": [{"protocol": "https", "format": "0 - 640x352", "url": "https://scontent-mad1-1.cdninstagram.com/v/t50.2886-16/38871629_1045788998909492_7127403467848548352_n.mp4?_nc_ht=scontent-mad1-1.cdninstagram.com&_nc_cat=108&_nc_ohc=vOzlzgkte68AX-p2fYb&oe=5ED52DB7&oh=97282b5d24dc6ea9bf5c867ce42d0bc2", "http_headers": {"Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7", "Accept-Language": "en-us,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3706.6 Safari/537.36"}, "height": 352, "width": 640, "ext": "mp4", "format_id": "0"}], "height": 352, "_filename": "Video by teslamotors-BmYooZbhCfJ.mp4", "like_count": 105929, "uploader": "Tesla", "format_id": "0", "uploader_id": "teslamotors", "playlist": null, "thumbnails": [{"url": "https://scontent-mad1-1.cdninstagram.com/v/t51.2885-15/e15/38517607_1061334650699625_2957597926945193984_n.jpg?_nc_ht=scontent-mad1-1.cdninstagram.com&_nc_cat=109&_nc_ohc=GeZdHGpRgOAAX8P5Dmp&oh=26d81fa41a54d8602c5cdd1fdc945bbe&oe=5ED4F294", "id": "0"}], "title": "Video by teslamotors", "url": "https://scontent-mad1-1.cdninstagram.com/v/t50.2886-16/38871629_1045788998909492_7127403467848548352_n.mp4?_nc_ht=scontent-mad1-1.cdninstagram.com&_nc_cat=108&_nc_ohc=vOzlzgkte68AX-p2fYb&oe=5ED52DB7&oh=97282b5d24dc6ea9bf5c867ce42d0bc2", "extractor_key": "Instagram", "http_headers": {"Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7", "Accept-Language": "en-us,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3706.6 Safari/537.36"}, "ext": "mp4", "comments": [], "id": "BmYooZbhCfJ", "width": 640, "comment_count": null, "playlist_index": null, "webpage_url": "https://www.instagram.com/p/BmYooZbhCfJ/", "requested_subtitles": null, "fulltitle": "Video by teslamotors", "thumbnail": "https://scontent-mad1-1.cdninstagram.com/v/t51.2885-15/e15/38517607_1061334650699625_2957597926945193984_n.jpg?_nc_ht=scontent-mad1-1.cdninstagram.com&_nc_cat=109&_nc_ohc=GeZdHGpRgOAAX8P5Dmp&oh=26d81fa41a54d8602c5cdd1fdc945bbe&oe=5ED4F294", "webpage_url_basename": "BmYooZbhCfJ"} ```

So is the combination of the Instagram URL + Proxy flag that is crashing the code in some way.

Checklist

dstftw commented 4 years ago

--write-pages and see what's returned by instagram.

Kikobeats commented 4 years ago

@dstftw the error is the same and it doesn't generate any debug file.

BTW, why the issue is considered invalid?