ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.73k stars 10.07k forks source link

qub.ca (TVA) stopped working #31240

Open airbete opened 2 years ago

airbete commented 2 years ago

Checklist

Verbose log

[debug] System config: ['--prefer-free-formats']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.qub.ca/tvaplus/tva/indefendable/saison-1/episode-2-apte-a-subir-son-proces-1080300766']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.10.6 (CPython) - Linux-5.19.8-200.fc36.x86_64-x86_64-with-glibc2.35
[debug] exe versions: ffmpeg 5.0.1, ffprobe 5.0.1, rtmpdump 2.4
[debug] Proxy map: {}
[Qub] 1080300766: Downloading JSON metadata
ERROR: Unable to download JSON metadata: HTTP Error 404: Not Found (caused by <HTTPError 404: 'Not Found'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 634, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2288, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib64/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/lib64/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/lib64/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/usr/lib64/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib64/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

Description

Downloading from qub.ca stopped working in the last 2 days. It has been working for the previous year, at least.

AB

dirkf commented 2 years ago

yt-dl's TVA extractor, through which extraction is routed, adds XFF with a CA IP to bypass the block, which works for me.

As the needed information is in the __NEXT_DATA__ hydration JSON in the actual show page, we can bypass the missing API URL, and also get some more metadata:

--- old/youtube_dl/extractor/tva.py
+++ new/youtube_dl/extractor/tva.py
@@ -2,11 +2,15 @@
 from __future__ import unicode_literals

 from .common import InfoExtractor
+from ..compat import compat_str
 from ..utils import (
     float_or_none,
+    get_element_by_id,
     int_or_none,
     smuggle_url,
+    str_to_int,
     strip_or_none,
+    try_get,
 )

@@ -52,37 +56,67 @@ class QubIE(InfoExtractor):
         'info_dict': {
             'id': '6084352463001',
             'ext': 'mp4',
-            'title': 'Épisode 01',
+            'title': 'Ép 01. Mon dernier jour',
             'uploader_id': '5481942443001',
             'upload_date': '20190907',
             'timestamp': 1567899756,
             'description': 'md5:9c0d7fbb90939420c651fd977df90145',
+            'age_limit': 13,
+        },
+    }, {
+        'url': 'https://www.qub.ca/tvaplus/tva/indefendable/saison-1/episode-2-apte-a-subir-son-proces-1080300766',
+        'md5': 'ba7e0da53f472d39230418a9d980dc9f',
+        'info_dict': {
+            'id': '6312064712112',
+            'ext': 'mp4',
+            'description': 'md5:9fd8701b50199e52fe9a5a43d20862e9',
+            'title': 'Ép 02. Apte à subir son procès?',
+            'timestamp': 1662681334,
+            'upload_date': '20220908',
+            'uploader_id': '5481942443001',
+            'age_limit': 8,
         },
     }, {
         'url': 'https://www.qub.ca/tele/video/lcn-ca-vous-regarde-rev-30s-ap369664-1009357943',
         'only_matching': True,
     }]
-    # reference_id also works with old account_id(5481942443001)
-    # BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5813221784001/default_default/index.html?videoId=ref:%s'
+
+    @staticmethod
+    def _parse_rating(rating):
+        age = str_to_int(rating)
+        if age is not None:
+            return age
+        return {
+            # CBSC
+            'Exempt': None,
+            'C': 0,
+            'C8': 8,
+            'G': 0,
+            'PG': 10,
+            # Régie du cinéma
+            'G-Dec': 8,  # "déconseillé"
+            }.get(rating)

     def _real_extract(self, url):
         entity_id = self._match_id(url)
-        entity = self._download_json(
-            'https://www.qub.ca/proxy/pfu/content-delivery-service/v1/entities',
-            entity_id, query={'id': entity_id})
+        webpage = self._download_webpage(url, entity_id)
+        next_data = get_element_by_id('__NEXT_DATA__', webpage) or '{}'
+        entity = self._parse_json(next_data, entity_id)['props']['initialProps']['pageProps']['fallbackData']
         video_id = entity['videoId']
-        episode = strip_or_none(entity.get('name'))
+        episode = strip_or_none(entity.get('name')) or None

         return {
             '_type': 'url_transparent',
             'id': video_id,
-            'title': episode,
-            # 'url': self.BRIGHTCOVE_URL_TEMPLATE % entity['referenceId'],
+            'title': episode or self._generic_title(url),
             'url': 'https://videos.tva.ca/details/_' + video_id,
             'description': entity.get('longDescription'),
             'duration': float_or_none(entity.get('durationMillis'), 1000),
             'episode': episode,
             'episode_number': int_or_none(entity.get('episodeNumber')),
-            # 'ie_key': 'BrightcoveNew',
+            'channel': try_get(entity, lambda x: x['knownEntities']['channel']['name'], compat_str),
+            'series': try_get(entity, lambda x: x['knownEntities']['videoShow']['name'], compat_str),
+            'season_number': int_or_none(self._search_regex(r'/s(?:ai|ea)son-(\d+)/', entity.get('slug', ''), 'season', default=None)),
+            'age_limit': self._parse_rating(entity.get('parentalRating')),
             'ie_key': TVAIE.ie_key(),
         }

As the existing test works in the revised code, I assume that the site has completely switched to the NextJS page format and so have removed the old tactic rather than making it a fallback.

airbete commented 2 years ago

The patch works for me. Thank you very much.

audas commented 2 years ago

I don't want to start a new issue - however this is happening for me on Twitter.

Windows, latest version. Other videos are working. Only variation I noticed was that the video had been given ALT-tags

`C:\Users\redacted>yt-dlp -o temp_633f505e35d2e.mp4 https://twitter.com/i/status/1578090984604303362 -vU [debug] Command-line config: ['-o', 'temp_633f505e35d2e.mp4', 'https://twitter.com/i/status/1578090984604303362', '-vU']

[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out utf-8, error utf-8, screen utf-8 [debug] yt-dlp version 2022.10.04 [4e0511f] (win32_exe) [debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.19044-SP0 [debug] Checking exe version: ffmpeg -bsfs [debug] Checking exe version: ffprobe -bsfs [debug] exe versions: ffmpeg 2022-04-25-git-f2724d2b69-full_build-www.gyan.dev (setts), ffprobe 2022-04-25-git-f2724d2b6 9-full_build-www.gyan.dev [debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.09.24, mutagen-1.45.1, sqlite3-2.6.0, websocke ts-10.3 [debug] Proxy map: {} [debug] Loaded 1690 extractors [debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest Latest version: 2022.10.04, Current version: 2022.10.04 yt-dlp is up to date (2022.10.04) [debug] [twitter] Extracting URL: https://twitter.com/i/status/1578090984604303362 [twitter] 1578090984604303362: Downloading guest token [twitter] 1578090984604303362: Downloading JSON metadata ERROR: [twitter] 1578090984604303362: Unable to download JSON metadata: HTTP Error 404: Not Found (caused by <HTTPError 404: 'Not Found'>); please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriat e issue template. Confirm you are on the latest version using yt-dlp -U File "yt_dlp\extractor\common.py", line 672, in extract File "yt_dlp\extractor\twitter.py", line 441, in _real_extract File "yt_dlp\extractor\twitter.py", line 100, in _call_api File "yt_dlp\extractor\common.py", line 1032, in download_content File "yt_dlp\extractor\common.py", line 996, in download_handle File "yt_dlp\extractor\common.py", line 866, in _download_webpage_handle File "yt_dlp\extractor\common.py", line 823, in _request_webpage

File "yt_dlp\extractor\common.py", line 805, in _request_webpage File "yt_dlp\YoutubeDL.py", line 3682, in urlopen File "urllib\request.py", line 531, in open File "urllib\request.py", line 640, in http_response File "urllib\request.py", line 569, in error File "urllib\request.py", line 502, in _call_chain File "urllib\request.py", line 649, in http_error_default urllib.error.HTTPError: HTTP Error 404: Not Found`

dirkf commented 2 years ago
  1. Unrelated (even more than when you posted similarly in a Twitter thread): different extractor, different program.
  2. Not repeatable with yt-dl: probably a glitch in Twitter's CDN or some proxy.
  3. Feel free to open an issue at https://github.com/yt-dlp/yt-dlp/issues/new/choose if the problem continues.
audas commented 2 years ago

Sorry, I was looking at this thread https://github.com/yt-dlp/yt-dlp/issues/4989 for yt-dlp where it was said it was the same as this issue and I followed that link - I was confused by that link coming here - Taupemoi closed that thread in favour of this one taupemoi

Happening in both, youtube-dl and yt-dlp

Ill open a new one.

cduvallll commented 2 years ago

Hey guys I am using youtube-dl fin Windows, how can i implement the workaround DirkF proposed at the top of this thread ?

thx

dirkf commented 2 years ago

If you have installed your yt-dl with pip (or in some other way that exposes the program's module files), you can replace the youtube_dl/extractor/tva.py file with this download.

Otherwise you have to wait for a release.

cduvallll commented 2 years ago

It works. Thanks man for your help and support. Appreciated

Conrad

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: @.> Sent: October 6, 2022 9:43 PM To: @.> Cc: @.>; @.> Subject: Re: [ytdl-org/youtube-dl] qub.ca (TVA) stopped working (Issue #31240)

If you have installed your yt-dl with pip (or in some other way that exposes the program's module files), you can replace the youtube_dl/extractor/tva.py file with this downloadhttps://github.com/dirkf/youtube-dl/raw/df-tva-extractor-ovrhaul/youtube_dl/extractor/tva.py.

Otherwise you have to wait for a release.

— Reply to this email directly, view it on GitHubhttps://github.com/ytdl-org/youtube-dl/issues/31240#issuecomment-1270999589, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AM5GI65M4AKIHZQXCKWLGADWB555HANCNFSM6AAAAAAQNZEPNA. You are receiving this because you commented.Message ID: @.***>