Open orangerkater opened 1 year ago
Maybe @goggle could help out?
The UUID from the old format is found in the data-assetid
attribute of the player <div>
which has class js-media
:
data-assetid="urn:srf:audio:2458a159-dcc5-44c5-9730-b5043f2d3f95"
So,
_VALID__URL
to match the new format--- old/youtube_dl/extractor/srgssr.py
+++ new/youtube_dl/extractor/srgssr.py
@@ -5,6 +5,7 @@
from .common import InfoExtractor
from ..utils import (
+ extract_attributes,
ExtractorError,
float_or_none,
int_or_none,
@@ -161,12 +162,18 @@
_VALID_URL = r'''(?x)
https?://
(?:(?:www|play)\.)?
- (?P<bu>srf|rts|rsi|rtr|swissinfo)\.ch/play/(?:tv|radio)/
- (?:
- [^/]+/(?P<type>video|audio)/[^?]+|
- popup(?P<type_2>video|audio)player
- )
- \?.*?\b(?:id=|urn=urn:[^:]+:video:)(?P<id>[0-9a-f\-]{36}|\d+)
+ (?P<bu>srf|rts|rsi|rtr|swissinfo)\.ch/
+ (?:
+ play/(?:tv|radio)/
+ (?:
+ [^/]+/(?P<type>video|audio)/[^?]+|
+ popup(?P<type_2>video|audio)player
+ )|
+ (?:
+ (?P<type_3>video|audio)(?:/[^/]+)+/?
+ )
+ )
+ \?.*?\b(?:(?:partId|id)=|urn=urn:[^:]+:video:)(?P<id>[0-9a-f\-]{36}|\d+)
'''
_TESTS = [{
@@ -247,6 +254,12 @@
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
bu = mobj.group('bu')
- media_type = mobj.group('type') or mobj.group('type_2')
+ media_type = mobj.group('type') or mobj.group('type_2') or mobj.group('type_3')
media_id = mobj.group('id')
+ if mobj.group('type_3') and len(media_id) < 36:
+ webpage = self._download_webpage(url, media_id)
+ player = self._search_regex(r'''(<div\b[^>]+\bclass\s*=\s*('|")js-media\2[^>]*>)''', webpage, 'Media URN') or ''
+ player = extract_attributes(player)
+ urn = player.get('data-assetid') or ''
+ media_id = urn.rsplit(':', 1)[-1]
return self.url_result('srgssr:%s:%s:%s' % (bu[:3], media_type, media_id), 'SRGSSR')
This works like a charm. What an amazing response time! Thank you so much!
How will this patch find its way into upstream? Should I create a pull request?
Please do.
Checklist
Verbose log
Description
SRG has changed their url schematics. The url uses
/audio/
instead of/play/radio/
and the ids are much shorter now. New example url: https://www.srf.ch/audio/maloney/frohe-weihnachten?id=12304744Old uris seem to work, but calling it in a browser you will get redirected. Example: http://www.rtr.ch/play/radio/actualitad/audio/saira-tujetsch-tuttina-cuntinuar-cun-sedrun-muster-turissem?id=63cb0778-27f8-49af-9284-8c7a8c6d15fc becomes https://www.rtr.ch/audio/actualitad/saira-tujetsch-tuttina-cuntinuar-cun-sedrun-muster-turissem?partId=10728785
Funny enough, in this example SRG uses
partId
instead ofid
.