Closed Thalia500 closed 1 year ago
Looks like that's a new type of URL. We can base64-decode the html basename...
>>> import base64
>>> base64.urlsafe_b64decode('MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==').decode()
'20230614/n601315192.shtml'
...and get a regular Sohu url path.
A patch like this should handle it:
diff --git a/yt_dlp/extractor/sohu.py b/yt_dlp/extractor/sohu.py
index a8f1e4623..5f0fb6192 100644
--- a/yt_dlp/extractor/sohu.py
+++ b/yt_dlp/extractor/sohu.py
@@ -1,3 +1,4 @@
+import base64
import re
from .common import InfoExtractor
@@ -9,6 +10,7 @@
ExtractorError,
int_or_none,
try_get,
+ urljoin,
)
@@ -196,3 +198,12 @@ def _fetch_data(vid_id, mytv=False):
}
return info
+
+
+class SohuVIE(InfoExtractor):
+ _VALID_URL = r'(?P<base>https?://tv\.sohu\.com/)v/(?P<id>[\w=-]+)\.html(?:$|[#?])'
+
+ def _real_extract(self, url):
+ base_url, encoded_id = self._match_valid_url(url).group('base', 'id')
+ path = base64.urlsafe_b64decode(encoded_id).decode()
+ return self.url_result(urljoin(base_url, path), SohuIE)
I'm geo-blocked, but it should work for someone who's not:
$ yt-dlp -F "https://tv.sohu.com/v/MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==.html"
[SohuV] Extracting URL: https://tv.sohu.com/v/MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==.html
[Sohu] Extracting URL: https://tv.sohu.com/20230614/n601315192.shtml
[Sohu] 601315192: Downloading webpage
[Sohu] 601315192: Downloading JSON data for 8484094
ERROR: [Sohu] 601315192: Sohu said: The video is only licensed to users in Mainland China.
You might want to use a VPN or a proxy server (with --proxy) to workaround.
I think someone just needs to find out if there are my.tv.sohu.com
links like this, too, and we could match them if so
Looks like that's a new type of URL. We can base64-decode the html basename...
>>> import base64 >>> base64.urlsafe_b64decode('MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==').decode() '20230614/n601315192.shtml'
...and get a regular Sohu url path.
A patch like this should handle it:
diff --git a/yt_dlp/extractor/sohu.py b/yt_dlp/extractor/sohu.py index a8f1e4623..5f0fb6192 100644 --- a/yt_dlp/extractor/sohu.py +++ b/yt_dlp/extractor/sohu.py @@ -1,3 +1,4 @@ +import base64 import re from .common import InfoExtractor @@ -9,6 +10,7 @@ ExtractorError, int_or_none, try_get, + urljoin, ) @@ -196,3 +198,12 @@ def _fetch_data(vid_id, mytv=False): } return info + + +class SohuVIE(InfoExtractor): + _VALID_URL = r'(?P<base>https?://tv\.sohu\.com/)v/(?P<id>[\w=-]+)\.html(?:$|[#?])' + + def _real_extract(self, url): + base_url, encoded_id = self._match_valid_url(url).group('base', 'id') + path = base64.urlsafe_b64decode(encoded_id).decode() + return self.url_result(urljoin(base_url, path), SohuIE)
I'm geo-blocked, but it should work for someone who's not:
$ yt-dlp -F "https://tv.sohu.com/v/MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==.html" [SohuV] Extracting URL: https://tv.sohu.com/v/MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==.html [Sohu] Extracting URL: https://tv.sohu.com/20230614/n601315192.shtml [Sohu] 601315192: Downloading webpage [Sohu] 601315192: Downloading JSON data for 8484094 ERROR: [Sohu] 601315192: Sohu said: The video is only licensed to users in Mainland China. You might want to use a VPN or a proxy server (with --proxy) to workaround.
I think someone just needs to find out if there are
my.tv.sohu.com
links like this, too, and we could match them if so
Thankyou!!!! But it seems like it has other issues.
yt-dlp -vU -F "https://tv.sohu.com/v/MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==.html" [debug] Command-line config: ['-vU', '-F', 'https://tv.sohu.com/v/MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==.html'] [debug] Encodings: locale cp936, fs utf-8, pref cp936, out utf-8, error utf-8, screen utf-8 [debug] yt-dlp version stable@2023.06.22 [812cdfa06] (pip) [debug] Lazy loading extractors is disabled [debug] Python 3.10.9 (CPython AMD64 64bit) - Windows-10-10.0.22621-SP0 (OpenSSL 1.1.1t 7 Feb 2023) [debug] exe versions: ffmpeg N-110972-gbaa9fccf8d-20230601 (setts), ffprobe N-110972-gbaa9fccf8d-20230601, phantomjs 2.1.1 [debug] Optional libraries: Cryptodome-3.18.0, brotli-None, certifi-2023.05.07, mutagen-1.46.0, sqlite3-2.6.0, websockets-11.0.3 [debug] Proxy map: {} [debug] Loaded 1851 extractors [debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest Available version: stable@2023.06.22, Current version: stable@2023.06.22 yt-dlp is up to date (stable@2023.06.22) [generic] Extracting URL: https://tv.sohu.com/v/MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==.html [generic] MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==: Downloading webpage WARNING: [generic] Falling back on generic information extractor [generic] MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==: Extracting information [debug] Looking for embeds ERROR: Unsupported URL: https://tv.sohu.com/v/MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==.html Traceback (most recent call last): File "C:\python_conda_package\Anaconda3\lib\site-packages\yt_dlp\YoutubeDL.py", line 1555, in wrapper return func(self, *args, **kwargs) File "C:\python_conda_package\Anaconda3\lib\site-packages\yt_dlp\YoutubeDL.py", line 1631, in __extract_info ie_result = ie.extract(url) File "C:\python_conda_package\Anaconda3\lib\site-packages\yt_dlp\extractor\common.py", line 708, in extract ie_result = self._real_extract(url) File "C:\python_conda_package\Anaconda3\lib\site-packages\yt_dlp\extractor\generic.py", line 2568, in _real_extract raise UnsupportedError(url) yt_dlp.utils.UnsupportedError: Unsupported URL: https://tv.sohu.com/v/MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==.html
@Thalia500 I've applied the patch by bashonly to this branch (new IE class needs to be imported in _extractors.py
). You can take a look at this:
https://github.com/c-basalt/yt-dlp/tree/sohu-fix
I checked some old my.tv.sohu.com
links in test cases and they are redirected as well, though the redirected domain is also tv.sohu.com
.
~The Multipart video
link in the test case appears to be broken. Might want to fix that before starting a PR.~ Multipart video URL is now fixed, just need some testing and feedback.
Duplicate of #1667 (thanks for finding that @c-basalt)
DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
Checklist
Region
China
Provide a description that is worded well enough to be understood
Can't download videos from sohu. Use URL https://tv.sohu.com/v/MjAyMzA2MTQvbjYwMTMxNTE5Mi5zaHRtbA==.html
Provide verbose output that clearly demonstrates the problem
yt-dlp -vU <your command line>
)'verbose': True
toYoutubeDL
params instead[debug] Command-line config
) and insert it belowComplete Verbose Output