ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.97k stars 10.01k forks source link

NRKSuper.no broken #29463

Open yurgh opened 3 years ago

yurgh commented 3 years ago

Checklist

Verbose log

C:\Users\jorg>youtube-dl --verbose https://nrksuper.no/serie/poppeloppane/MSUI16003115/sesong-1/episode-1
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://nrksuper.no/serie/poppeloppane/MSUI16003115/sesong-1/episode-1']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2021.06.06
[debug] Python version 3.4.4 (CPython) - Windows-10-10.0.19041
[debug] exe versions: none
[debug] Proxy map: {}
[debug] Using fake IP 84.214.110.128 (NO) as X-Forwarded-For.
[NRKTVSeries] poppeloppane: Downloading serie JSON
ERROR: Unable to download JSON metadata: <urlopen error [Errno 11001] getaddrinfo failed> (caused by URLError(gaierror(11001, 'getaddrinfo failed'),))
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpkqxnwl31\build\youtube_dl\extractor\common.py", line 634, in _request_webpage
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpkqxnwl31\build\youtube_dl\YoutubeDL.py", line 2288, in urlopen
  File "C:\Python\Python34\lib\urllib\request.py", line 470, in open
  File "C:\Python\Python34\lib\urllib\request.py", line 580, in http_response
  File "C:\Python\Python34\lib\urllib\request.py", line 502, in error
  File "C:\Python\Python34\lib\urllib\request.py", line 442, in _call_chain
  File "C:\Python\Python34\lib\urllib\request.py", line 685, in http_error_302
  File "C:\Python\Python34\lib\urllib\request.py", line 464, in open
  File "C:\Python\Python34\lib\urllib\request.py", line 482, in _open
  File "C:\Python\Python34\lib\urllib\request.py", line 442, in _call_chain
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpkqxnwl31\build\youtube_dl\utils.py", line 2737, in https_open
  File "C:\Python\Python34\lib\urllib\request.py", line 1185, in do_open

Description

Media from nrksuper.no fails to download. It works in the browser. No issues with socket.getaddrinfo('nrksuper.no', 443) in python directly.

ghost commented 3 years ago

https://tv.nrk.no/serie/poppeloppane/sesong/1/episode/1

NoLooseEnds commented 3 years ago

I'm also seeing something similar, both on nrksuper and on tv.nrk.no

/usr/local/bin/youtube-dl --verbose -i https://tv.nrk.no/serie/fantus-og-maskinene/sesong/1
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'--verbose', u'-i', u'https://tv.nrk.no/serie/fantus-og-maskinene/sesong/1']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.06.06
[debug] Python version 2.7.18 (CPython) - Linux-5.4.0-77-generic-x86_64-with-Ubuntu-20.04-focal
[debug] exe versions: ffmpeg 4.2.4, ffprobe 4.2.4
[debug] Proxy map: {}
[debug] Using fake IP 84.213.253.112 (NO) as X-Forwarded-For.
[NRKTVSeason] fantus-og-maskinene/1: Downloading season JSON
ERROR: Unable to download JSON metadata: <urlopen error [Errno 113] No route to host> (caused by URLError(error(113, 'No route to host'),))
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 634, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2288, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 467, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 654, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python2.7/urllib2.py", line 429, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/local/bin/youtube-dl/youtube_dl/utils.py", line 2737, in https_open
    req, **kwargs)
  File "/usr/lib/python2.7/urllib2.py", line 1205, in do_open
    raise URLError(err)

Network is fine.

ghost commented 3 years ago

yeaa gets the same output

ERROR: Unable to download JSON metadata: <urlopen error [Errno 60] Operation timed out> (caused by URLError(TimeoutError(60, 'Operation timed out')))

dirkf commented 1 year ago

This is almost working in current master (b8a86dc) but downloads the whole series instead of E1 (or would if in NO):

The problem is that the NRKTV URL pattern doesn't recognise the URL without tv. in the domain:

--- old/youtube_dl/extractor/nrk.py
+++ new/youtube_dl/extractor/nrk.py
@@ -286,7 +286,7 @@
 class NRKTVIE(InfoExtractor):
     IE_DESC = 'NRK TV and NRK Radio'
     _EPISODE_RE = r'(?P<id>[a-zA-Z]{4}\d{8})'
-    _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/(?:[^/]+/)*%s' % _EPISODE_RE
+    _VALID_URL = r'https?://(?:(?:tv|radio)\.)?nrk(?:super)?\.no/(?:[^/]+/)*%s' % _EPISODE_RE
     _TESTS = [{
         'url': 'https://tv.nrk.no/program/MDDP12000117',
         'md5': 'c4a5960f1b00b40d47db65c1064e0ab1',
@@ -402,6 +402,9 @@
         'only_matching': True,
     }, {
         'url': 'https://radio.nrk.no/serie/dagsnytt/sesong/201507/NPUB21019315',
+        'only_matching': True,
+    }, {
+        'url': 'https://nrksuper.no/serie/poppeloppane/MSUI16003115/sesong-1/episode-1',
         'only_matching': True,
     }]

@@ -799,7 +802,7 @@

 class NRKPlaylistIE(NRKPlaylistBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video|skole)(?:[^/]+/)+(?P<id>[^/]+)'
+    _VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video|skole|program)(?:[^/]+/)+(?P<id>[^/]+)'
     _ITEM_RE = r'class="[^"]*\brich\b[^"]*"[^>]+data-video-id="([^"]+)"'
     _TESTS = [{
         'url': 'http://www.nrk.no/troms/gjenopplev-den-historiske-solformorkelsen-1.12270763',

This is would work if in NO:

This is working from UK: