Open airbete opened 2 years ago
yt-dl's TVA extractor, through which extraction is routed, adds XFF with a CA IP to bypass the block, which works for me.
As the needed information is in the __NEXT_DATA__
hydration JSON in the actual show page, we can bypass the missing API URL, and also get some more metadata:
--- old/youtube_dl/extractor/tva.py
+++ new/youtube_dl/extractor/tva.py
@@ -2,11 +2,15 @@
from __future__ import unicode_literals
from .common import InfoExtractor
+from ..compat import compat_str
from ..utils import (
float_or_none,
+ get_element_by_id,
int_or_none,
smuggle_url,
+ str_to_int,
strip_or_none,
+ try_get,
)
@@ -52,37 +56,67 @@ class QubIE(InfoExtractor):
'info_dict': {
'id': '6084352463001',
'ext': 'mp4',
- 'title': 'Épisode 01',
+ 'title': 'Ép 01. Mon dernier jour',
'uploader_id': '5481942443001',
'upload_date': '20190907',
'timestamp': 1567899756,
'description': 'md5:9c0d7fbb90939420c651fd977df90145',
+ 'age_limit': 13,
+ },
+ }, {
+ 'url': 'https://www.qub.ca/tvaplus/tva/indefendable/saison-1/episode-2-apte-a-subir-son-proces-1080300766',
+ 'md5': 'ba7e0da53f472d39230418a9d980dc9f',
+ 'info_dict': {
+ 'id': '6312064712112',
+ 'ext': 'mp4',
+ 'description': 'md5:9fd8701b50199e52fe9a5a43d20862e9',
+ 'title': 'Ép 02. Apte à subir son procès?',
+ 'timestamp': 1662681334,
+ 'upload_date': '20220908',
+ 'uploader_id': '5481942443001',
+ 'age_limit': 8,
},
}, {
'url': 'https://www.qub.ca/tele/video/lcn-ca-vous-regarde-rev-30s-ap369664-1009357943',
'only_matching': True,
}]
- # reference_id also works with old account_id(5481942443001)
- # BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5813221784001/default_default/index.html?videoId=ref:%s'
+
+ @staticmethod
+ def _parse_rating(rating):
+ age = str_to_int(rating)
+ if age is not None:
+ return age
+ return {
+ # CBSC
+ 'Exempt': None,
+ 'C': 0,
+ 'C8': 8,
+ 'G': 0,
+ 'PG': 10,
+ # Régie du cinéma
+ 'G-Dec': 8, # "déconseillé"
+ }.get(rating)
def _real_extract(self, url):
entity_id = self._match_id(url)
- entity = self._download_json(
- 'https://www.qub.ca/proxy/pfu/content-delivery-service/v1/entities',
- entity_id, query={'id': entity_id})
+ webpage = self._download_webpage(url, entity_id)
+ next_data = get_element_by_id('__NEXT_DATA__', webpage) or '{}'
+ entity = self._parse_json(next_data, entity_id)['props']['initialProps']['pageProps']['fallbackData']
video_id = entity['videoId']
- episode = strip_or_none(entity.get('name'))
+ episode = strip_or_none(entity.get('name')) or None
return {
'_type': 'url_transparent',
'id': video_id,
- 'title': episode,
- # 'url': self.BRIGHTCOVE_URL_TEMPLATE % entity['referenceId'],
+ 'title': episode or self._generic_title(url),
'url': 'https://videos.tva.ca/details/_' + video_id,
'description': entity.get('longDescription'),
'duration': float_or_none(entity.get('durationMillis'), 1000),
'episode': episode,
'episode_number': int_or_none(entity.get('episodeNumber')),
- # 'ie_key': 'BrightcoveNew',
+ 'channel': try_get(entity, lambda x: x['knownEntities']['channel']['name'], compat_str),
+ 'series': try_get(entity, lambda x: x['knownEntities']['videoShow']['name'], compat_str),
+ 'season_number': int_or_none(self._search_regex(r'/s(?:ai|ea)son-(\d+)/', entity.get('slug', ''), 'season', default=None)),
+ 'age_limit': self._parse_rating(entity.get('parentalRating')),
'ie_key': TVAIE.ie_key(),
}
As the existing test works in the revised code, I assume that the site has completely switched to the NextJS page format and so have removed the old tactic rather than making it a fallback.
The patch works for me. Thank you very much.
I don't want to start a new issue - however this is happening for me on Twitter.
Windows, latest version. Other videos are working. Only variation I noticed was that the video had been given ALT-tags
`C:\Users\redacted>yt-dlp -o temp_633f505e35d2e.mp4 https://twitter.com/i/status/1578090984604303362 -vU [debug] Command-line config: ['-o', 'temp_633f505e35d2e.mp4', 'https://twitter.com/i/status/1578090984604303362', '-vU']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out utf-8, error utf-8, screen utf-8 [debug] yt-dlp version 2022.10.04 [4e0511f] (win32_exe) [debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.19044-SP0 [debug] Checking exe version: ffmpeg -bsfs [debug] Checking exe version: ffprobe -bsfs [debug] exe versions: ffmpeg 2022-04-25-git-f2724d2b69-full_build-www.gyan.dev (setts), ffprobe 2022-04-25-git-f2724d2b6 9-full_build-www.gyan.dev [debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.09.24, mutagen-1.45.1, sqlite3-2.6.0, websocke ts-10.3 [debug] Proxy map: {} [debug] Loaded 1690 extractors [debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest Latest version: 2022.10.04, Current version: 2022.10.04 yt-dlp is up to date (2022.10.04) [debug] [twitter] Extracting URL: https://twitter.com/i/status/1578090984604303362 [twitter] 1578090984604303362: Downloading guest token [twitter] 1578090984604303362: Downloading JSON metadata ERROR: [twitter] 1578090984604303362: Unable to download JSON metadata: HTTP Error 404: Not Found (caused by <HTTPError 404: 'Not Found'>); please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriat e issue template. Confirm you are on the latest version using yt-dlp -U File "yt_dlp\extractor\common.py", line 672, in extract File "yt_dlp\extractor\twitter.py", line 441, in _real_extract File "yt_dlp\extractor\twitter.py", line 100, in _call_api File "yt_dlp\extractor\common.py", line 1032, in download_content File "yt_dlp\extractor\common.py", line 996, in download_handle File "yt_dlp\extractor\common.py", line 866, in _download_webpage_handle File "yt_dlp\extractor\common.py", line 823, in _request_webpage
File "yt_dlp\extractor\common.py", line 805, in _request_webpage File "yt_dlp\YoutubeDL.py", line 3682, in urlopen File "urllib\request.py", line 531, in open File "urllib\request.py", line 640, in http_response File "urllib\request.py", line 569, in error File "urllib\request.py", line 502, in _call_chain File "urllib\request.py", line 649, in http_error_default urllib.error.HTTPError: HTTP Error 404: Not Found`
Sorry, I was looking at this thread https://github.com/yt-dlp/yt-dlp/issues/4989 for yt-dlp where it was said it was the same as this issue and I followed that link - I was confused by that link coming here - Taupemoi closed that thread in favour of this one taupemoi
Happening in both, youtube-dl and yt-dlp
Ill open a new one.
Hey guys I am using youtube-dl fin Windows, how can i implement the workaround DirkF proposed at the top of this thread ?
thx
If you have installed your yt-dl with pip (or in some other way that exposes the program's module files), you can replace the youtube_dl/extractor/tva.py
file with this download.
Otherwise you have to wait for a release.
It works. Thanks man for your help and support. Appreciated
Conrad
Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows
From: @.> Sent: October 6, 2022 9:43 PM To: @.> Cc: @.>; @.> Subject: Re: [ytdl-org/youtube-dl] qub.ca (TVA) stopped working (Issue #31240)
If you have installed your yt-dl with pip (or in some other way that exposes the program's module files), you can replace the youtube_dl/extractor/tva.py file with this downloadhttps://github.com/dirkf/youtube-dl/raw/df-tva-extractor-ovrhaul/youtube_dl/extractor/tva.py.
Otherwise you have to wait for a release.
— Reply to this email directly, view it on GitHubhttps://github.com/ytdl-org/youtube-dl/issues/31240#issuecomment-1270999589, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AM5GI65M4AKIHZQXCKWLGADWB555HANCNFSM6AAAAAAQNZEPNA. You are receiving this because you commented.Message ID: @.***>
Checklist
Verbose log
Description
Downloading from qub.ca stopped working in the last 2 days. It has been working for the previous year, at least.
AB