ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.19k stars 9.93k forks source link

Address changed for smotrim.ru #31647

Open betterthanever2 opened 1 year ago

betterthanever2 commented 1 year ago

Checklist

Description

Right now, trying to download a video from an address like https://smotrim.ru/video/2568723 results in a timeout error due to failure to resolve the player address. This is likely because resolved video address used to have this format: http://player.rutv.ru/iframe/datavideo/id/2568723, but seems to have been changed recently to https://player.smotrim.ru/iframe/video/id/2568723.

Correction: the issue is actually with the JSON data file URL, which is now https://player.smotrim.ru/iframe/datavideo/id/2568924/sid/smotrim, i.e. features smotrim.ru instead of rutv.ru, and has the /sid/smotrim part at the end.

dirkf commented 1 year ago

There is no specific support for smotrim.ru in either the released yt-dl or the latest git master. Some time ago we discovered that the rutube.ru extractor would also handle the site.

Maybe you are using yt-dlp, which does have an extractor that now times out? See #30839.

Background

betterthanever2 commented 1 year ago

smotrim.ru is handled by the RUTV extractor. A few months back I submitted an issue here about it not being supported explicitly, and eventually a patch was suggested, and it still works for me. I changed the extractor script to handle the above issue as well, so downloads work fine for me right now. I just thought, I'd let you know about this change.

Vangelis66 commented 1 year ago

I changed the extractor script to handle the above issue as well

... Perhaps, then, you'd be kind and willing 😉 to share this newest patch of yours here or, even kinder, create a PR with the necessary changes to rutv.py that restore smotrim.ru support in youtube-dl?

Thanks 😃

dirkf commented 1 year ago

Sure, if you're using a patched version, that's understandable, but you could have noted that against "verified that I'm running youtube-dl version 2021.12.17".

Perhaps submit your additional patch so that we can roll it in due course, or into yt-dlp?

Vangelis66 commented 1 year ago

OP is located inside Ukraine... Living in the EU, I find I'm unable to access:
https://smotrim.ru/video/2568723 [Unable to connect error in my browser] so I assume this applies...

However, http://player.rutv.ru/iframe/datavideo/id/2568723 DOES load OK here (the opposite to OP), while https://player.smotrim.ru/iframe/video/id/2568723 and https://player.smotrim.ru/iframe/datavideo/id/2568924/sid/smotrim DO NOT (to be expected, due to EU-wide block of smotrium.ru)

OTOH, When I VPN to Russia, https://smotrim.ru/video/2568723 DOES load now, as well as https://player.smotrim.ru/iframe/video/id/2568723 and https://player.smotrim.ru/iframe/datavideo/id/2568924/sid/smotrim but http://player.rutv.ru/iframe/datavideo/id/2568723 DOES NOT (it times out, consistent with OP's report) ...

At the current state of world affairs, it seems any eventual support for smotrim.ru (via rutv.py) would be IP-dependent... 😞

betterthanever2 commented 1 year ago

I changed the extractor script to handle the above issue as well

... Perhaps, then, you'd be kind and willing 😉 to share this newest patch of yours here or, even kinder, create a PR with the necessary changes to rutv.py that restore smotrim.ru support in youtube-dl?

Thanks 😃

The patch is replacing line 155 'http://player.rutv.ru/iframe/data%s/id/%s' % ('live' if is_live else 'video', video_id) with f'http://player.smotrim.ru/iframe/data{"live" if is_live else "video"}/id/{video_id}/sid/smotrim'. I am wary about submitting this as a PR, because I'm pretty sure this won't work universally.

betterthanever2 commented 1 year ago

OP is located inside the Ukraine... not that big of a deal, but Ukraine goes without an article.

Vangelis66 commented 1 year ago

The patch is replacing line 155

Which "line 155" ?

If one patches rutv.py (current state in git master) according to this patch by dirkf, one can't find L155 with cited content 😞 ... There's now L160 that appears to be relevant:

159        json_data = self._download_json(
160            '%s/iframe/data%s/id/%s' % (player, 'live' if is_live else 'video', video_id),
161            video_id, 'Downloading JSON')

so, perhaps, this one needs to be patched? IOW, what's the file you applied your posted patch on?

not that big of a deal, but Ukraine goes without an article.

Noted and corrected 😉 ; I must have been (subconsciously) carried away by ... inside the UK, that I often write myself :smile: ...

Regards.

betterthanever2 commented 1 year ago

The patch is replacing line 155

so, perhaps, this one needs to be patched? IOW, what's the file you applied your posted patch on?

Yes, the line you reference is the one. The patch by dirkf is the one I applied, I may have done that manually, and that may have resulted in a different number of lines in the file. Quite honestly, I don't remember, I just made it work and moved on.

Vangelis66 commented 1 year ago

... Right... It was a case of finding a working and whitelisted RU HTTPS proxy 😉 ... As I posted above, I simply patched the git master edition of rutv.py with the linked patch by dirkf - I did NOT patch anything else in RUTVIE ...

yt-dl --proxy "localhost:8080" -vF "https://smotrim.ru/video/2568723" => 

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--ffmpeg-location', '.\\FFmpeg', '--external-downloader-args', '-v 8 -stats', '--proxy', 'localhost:8080', '-vF', 'https://smotrim.ru/video/2568723']
[debug] Encodings: locale cp1253, fs mbcs, out cp737, pref cp1253
[debug] youtube-dl version 2023.02.23.114514
[debug] Python version 3.4.4 (CPython) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: ffmpeg n5.2-dev-2245-N-109649-gab8cde6, ffprobe n5.2-dev-2245-N-109649-gab8cde6, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {'https': 'localhost:8080', 'http': 'localhost:8080'}
[RUTV] 2568723: Downloading JSON
[RUTV] 2568723: Downloading m3u8 information
[info] Available formats for 2568723:
format code  extension  resolution note
hls-400      mp4        unknown     400k
hls-800      mp4        unknown     800k
hls-1200     mp4        unknown    1200k
hls-1800     mp4        unknown    1800k
hls-4050     mp4        unknown    4050k
http-1080    mp4        1920x1080
http-234     mp4        1920x1080
http-360     mp4        1920x1080
http-540     mp4        1920x1080
http-720     mp4        1920x1080  (best)

The extractor has to be amended, so that resolutions are: a) calculated for the HLS formats (currently unknown) b) corrected for the HTTP formats (all showing as 1920x1080;hint: width is derived from the format code, e.g. http-360 => 640x360)

Additionally, the (best) label should be assigned to http-1080 rather than to http-720; and, though advertised as available, format http-234 always returns a 404...

The media files AREN'T BLOCKED per se, at least not here...

yt-dl --proxy "localhost:8080" -f http-360 "https://smotrim.ru/video/2568723" -g => 

https://cdn-v.rtr-vesti.ru/_cdn_auth/secure/v/vh/mp4/medium-wide/002/740/306.mp4?auth=mh&vid=2740306

and then one can perform a DIRECT download:

yt-dl "https://cdn-v.rtr-vesti.ru/_cdn_auth/secure/v/vh/mp4/medium-wide/002/740/306.mp4?auth=mh&vid=2740306" -o "Кто против Одно из самых масштабных посланий президента Федеральному Собранию. Эфир от 21.02.2023-2568723.mp4" => 

[generic] 306: Requesting header
[download] Destination: Кто против Одно из самых масштабных посланий президента Федеральному Собранию. Эфир от 21.02.2023-2568723.mp4
[download] 100% of 271.86MiB in 07:39
dirkf commented 1 year ago

If the HLS media URLs are like the mp4 ones, a descriptive resolution can be extracted from the URL itself.

OT: historically the word "Ukraine", which is etymologically equivalent to Borders or Marches, was used in English for the region of the Russian Tsarist/Soviet empire where the modern independent country Ukraine is situated; that is the significance of "the Ukraine". A place might be inside the RF and not in Ukraine, but still be in "the Ukraine".

betterthanever2 commented 1 year ago

A place might be inside the RF and not in Ukraine, but still be in "the Ukraine".

Sorry, what?

Anyway, word "Ukraine" has an etymology, that is true, the same goes for almost any country name. What you mentioned about "borders or marches" is one hypothesis in this case (not an established fact, mind you), but more importantly, contemporary usage of the words is very rarely tied to their etymology. The world would be a weird place, if it was. So why don't we just agree to call the countries whatever their respective people have implicitly agreed upon, and leave it at that?

This is hardly a place for such discussions.

Vangelis66 commented 1 year ago

This is hardly a place for such discussions.

I agree 👍 ; be that as it may, "I" did not "invent" the "term" "the Ukraine" out of the blue, nor did I use it consciously to denote something demeaning about Ukraine's state of independence and/or citizens 😉 ...

After these two latest comments, I "searched" and Wikipedia (despite, at times, not being 100% credible) has an entry just for that, under "Etymology and orthography":

In the English-speaking world during most of the 20th century, Ukraine (whether independent or not) was referred to as "the Ukraine". This is because the word ukraina means "borderland", so the definite article would be natural in the English language; this is similar to "Nederlanden", which means "low lands" and is rendered in English as "the Netherlands".

However, since Ukraine's declaration of independence in 1991, this usage has become politicised and is now rarer, and style guides advise against its use. US ambassador William Taylor said that using "the Ukraine" implies disregard for Ukrainian sovereignty. The official Ukrainian position is that "the Ukraine" is both grammatically and politically incorrect.

Thus, I now understand why someone inside Ukraine today would feel "sensitive" about such an inclusion of the "definite article" (be it purely inadvertent 😄 ) ...

dirkf commented 1 year ago

That is, the "modern independent country whose name is Ukraine", glory to it.

The established usage and distinction in English that I described (more accurately than Wikipedia IMO) isn't a hypothesis. The use of "the Ukraine" is becoming rarer, if only and horrifically because the invasion has led English speakers to focus on the nation rather than the general area. Use of "the" doesn't apply in the main languages of the region, so the distinction might appear strange. An answer to "what?" could be Belgorod which has fallen under both Ukrainian and Russian sovereignty or administration within a few generations.

Although historical usage had "the Sudan" and "the Yemen", "the" is not used with a country name unless the actual name is some sort of phrase: the United States of America, the People's Republic of China, even the Nether-lands (nether being a quaint synonym for lower, as in be-neath); but actually we call it Holland, as an American might say England for Britain.

dirkf commented 1 year ago

no

betterthanever2 commented 1 year ago

The established usage and distinction in English that I described (more accurately than Wikipedia IMO) isn't a hypothesis.

Well, it is a hypothesis, albeit a commonly accepted one. I just don't think, there's such thing as a 100% fact with these things. It's not a matter of record, after all.

The use of "the Ukraine" is becoming rarer, if only and horrifically because the invasion has led English speakers to focus on the nation rather than the general area.

Quite honestly, I have never heard anybody referring to the 'general area' that way, not in English, not in Russian, and not in Ukrainian, so this "becoming rarer" feels... 🤨 How old are you, exactly? 😄

dirkf commented 1 year ago

From the UK, all the variants player.rutv.ru, together with player.smotrim.ru and player.vgtrk.com (same IP), resolve but respond to pings only occasionally, if at all, at least until the forthcoming Untergang.

The patched code can be modified at the 4th line of the _real_extract() method:

        player = 'http://player.smotrim.ru'

Possibly --geo-verification-proxy ... works?

Vangelis66 commented 1 year ago

Possibly --geo-verification-proxy ... works?

It doesn't:

yt-dl -vF --geo-verification-proxy "localhost:8080" "https://smotrim.ru/video/2568723" => 

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--ffmpeg-location', '.\\FFmpeg', '--external-downloader-args', '-v 8 -stats', '-vF', '--geo-verification-proxy', 'localhost:8080','https://smotrim.ru/video/2568723']
[debug] Encodings: locale cp1253, fs mbcs, out cp737, pref cp1253
[debug] youtube-dl version 2023.02.25.334
[debug] Python version 3.4.4 (CPython) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: ffmpeg n5.2-dev-2245-N-109649-gab8cde6, ffprobe n5.2-dev-2245-N-109649-gab8cde6, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[RUTV] 2568723: Downloading JSON
ERROR: Unable to download JSON metadata: <urlopen error [WinError 10061] No connection could be made because the target machine actively refused it> (caused by URLError(ConnectionRefusedError(10061, 'No connection could be made because the target machine actively refused it', None, 10061, None),))
  File "common.py", line 635, in _request_webpage
  File "YoutubeDL.py", line 2300, in urlopen
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 464, in open
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 482, in _open
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 442, in _call_chain
  File "D:\a\youtube-dl\youtube-dl\youtube_dl\utils.py", line 2634, in http_open
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 1185, in do_open

... and the reason it doesn't work I have explained in the past (in the context of a French TV InfoExtractor I'm now lazy to dig up): the InfoExtractor it's used with has to explicitly support that switch... Supporting it inside the IE means passing once (or multiple times), in the "right" place(s), an additional request header in the form of

headers=self.geo_verification_headers()

E.g. theplatformIE, a "framework" used in many other (mainly American) IEs, has this part of code:

https://github.com/ytdl-org/youtube-dl/blob/f7ce98a21e15cb094c772e9082796d009c61578b/youtube_dl/extractor/theplatform.py#L37-L40

--geo-verification-proxy is particular to IEs where: a) the playlist/manifest generating API is geo-blocked and can't be fooled by an x-Forwarded-For header (this includes the --geo-bypass and --geo-bypass-country switches) b) the stream CDN(s) do not geo-block.

In this case, the webpage itself is being blocked for me, so I expect the "extra" code has to be applied in the webpage fetch itself, not just for the "manifest" API... 😞

Vangelis66 commented 1 year ago

OK, based on my analysis above, and a lot of trial-and-error, I concocted a version of rutv.py that works for me with the --geo-verification-proxy switch and my HTTPS-only Russian proxy:

142    def _real_extract(self, url):
143        mobj = re.match(self._VALID_URL, url)
144        video_id = mobj.group('id')
145        video_path = mobj.group('path')
146-        player = 'http://player.smotrim.ru'
146+        player = 'https://player.smotrim.ru'
147        
148        if video_path.startswith('iframe'):
149            video_type = mobj.group('type')
150            if video_type == 'swf':
151                video_type = 'video'
152        elif video_path.startswith('index/iframe/cast_id'):
153            video_type = 'live'
154        else:
155            video_type = 'video'
156
157        is_live = video_type == 'live'
158
159        json_data = self._download_json(
160            '%s/iframe/data%s/id/%s' % (player, 'live' if is_live else 'video', video_id),
161-            video_id, 'Downloading JSON')
161+            video_id, 'Downloading JSON',
162+            headers=self.geo_verification_headers())

I had to explicitly set the HTTPS version of the player, else my proxy couldn't handle the redirection from HTTP to HTTPS (???):

yt-dl -vF --geo-verification-proxy "https://localhost:8080" "https://smotrim.ru/video/2568723" => 

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--ffmpeg-location', '.\\FFmpeg', '--external-downloader-args', '-v 8 -stats', '-vF', '--geo-verification-proxy', 'https://localhost:8080', 'https://smotrim.ru/video/2568723']
[debug] Encodings: locale cp1253, fs mbcs, out cp737, pref cp1253
[debug] youtube-dl version 2023.02.25.334
[debug] Python version 3.4.4 (CPython) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: ffmpeg n5.2-dev-2245-N-109649-gab8cde6, ffprobe n5.2-dev-2245-N-109649-gab8cde6, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[RUTV] 2568723: Downloading JSON
[RUTV] 2568723: Downloading m3u8 information
[info] Available formats for 2568723:
format code  extension  resolution note
hls-400      mp4        unknown     400k
hls-800      mp4        unknown     800k
hls-1200     mp4        unknown    1200k
hls-1800     mp4        unknown    1800k
hls-4050     mp4        unknown    4050k
http-1080    mp4        1920x1080
http-234     mp4        1920x1080
http-360     mp4        1920x1080
http-540     mp4        1920x1080
http-720     mp4        1920x1080  (best)

and then:

yt-dl --geo-verification-proxy "https://localhost:8080" -f http-540 "https://smotrim.ru/video/2568723" => 

[RUTV] 2568723: Downloading JSON
[RUTV] 2568723: Downloading m3u8 information
[download] Destination: Кто против Одно из самых масштабных посланий президента Федеральному Собранию. Эфир от 21.02.2023-2568723.mp4
[download]   4.7% of 528.15MiB at 693.60KiB/s ETA 12:22

NB that a dl speed of ca. 700KiB/s is due to the DIRECT connection 😜 ...