Open legolegs opened 2 years ago
The Twitter extractor doesn't have a twitter:(\d+)
URL pattern that would enable that, but it wouldn't help here since the objective is to route any access to Twitter resources via Tor.
See https://github.com/yt-dlp/yt-dlp/issues/3053.
With the patch below, I installed tor and torsocks.
--- old/youtube-dl/youtube_dl/extractor/twitter.py
+++ new/youtube-dl/youtube_dl/extractor/twitter.py
@@ -9,6 +9,7 @@
compat_parse_qs,
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
+ compat_urlparse,
)
from ..utils import (
dict_get,
@@ -30,8 +31,11 @@
class TwitterBaseIE(InfoExtractor):
- _API_BASE = 'https://api.twitter.com/1.1/'
- _BASE_REGEX = r'https?://(?:(?:www|m(?:obile)?)\.)?twitter\.com/'
+ _API_BASE_TMPL = 'https://api.%s/1.1/'
+ _TOR_BASE = 'twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion'
+ _PLAIN_API_BASE = _API_BASE_TMPL % ('twitter.com', )
+ _TOR_API_BASE = _API_BASE_TMPL % (_TOR_BASE, )
+ _BASE_REGEX = r'https?://(?:(?:www|m(?:obile)?)\.)?(?:twitter\.com|%s)/' % (re.escape(_TOR_BASE), )
_GUEST_TOKEN = None
def _extract_variant_formats(self, variant, video_id):
@@ -98,6 +102,13 @@
e.cause.read().decode(),
video_id)['errors'][0]['message'], expected=True)
raise
+
+ # Derived classes should call this super if the API is to be used
+ def _real_extract(self, url):
+ self._API_BASE = (
+ self._TOR_API_BASE
+ if compat_urlparse.urlparse(url).hostname.endswith('.onion')
+ else self._PLAIN_API_BASE)
class TwitterCardIE(InfoExtractor):
@@ -427,9 +438,14 @@
# poll4choice_video card
'url': 'https://twitter.com/SouthamptonFC/status/1347577658079641604',
'only_matching': True,
+ }, {
+ # Tor site
+ 'url': 'https://twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion/SouthamptonFC/status/1347577658079641604',
+ 'only_matching': True,
}]
def _real_extract(self, url):
+ super(TwitterIE, self)._real_extract(url)
twid = self._match_id(url)
status = self._call_api(
'statuses/show/%s.json' % twid, twid, {
@@ -650,6 +666,7 @@
}
def _real_extract(self, url):
+ super(TwitterBroadcastIE, self)._real_extract(url)
broadcast_id = self._match_id(url)
broadcast = self._call_api(
'broadcasts/show.json', broadcast_id,
Result (even after disabling the plain API URL):
$ torsocks python -m youtube_dl -v -F 'https://twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion/freethenipple/status/643211948184596480'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion/freethenipple/status/643211948184596480']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: a631e79b3
[debug] Python version 2.7.17 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[twitter] 643211948184596480: Downloading guest token
[twitter] 643211948184596480: Downloading JSON metadata
[twitter] 643211948184596480: Downloading m3u8 information
[info] Available formats for 643211948184596480:
format code extension resolution note
hls-320 mp4 240x240 320k , avc1.420015, mp4a.40.2
http-320 mp4 240x240 320k
hls-832 mp4 480x480 832k , avc1.42001f, mp4a.40.2
http-832 mp4 480x480 832k (best)
$
The media link found in this way:
https://video.twitterhbmit57bzbcjnujedrn7uk73geo4ackio4lxdj6t7w6f4zsid.onion/ext_tw_video/643211870443208704/pu/vid/480x480/2a49dLeT5eSHhMhe.mp4
Checklist
Description
Recently the Twitter got the .onion (TOR) address. See https://help.twitter.com/en/using-twitter/twitter-supported-browsers The address is https://twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion/ (yes, the domain name is that long) It even got the proper SSL cert. The example video: https://twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion/natchan1984/status/1502504750054461442 Example command line and output (you got to have the torbrowser or a standalone tor daemon running)
I think the fix should be applied somewhere among those lines: https://github.com/ytdl-org/youtube-dl/blob/6508688e88c83bb811653083db9351702cd39a6a/youtube_dl/extractor/twitter.py#L34 I thought there might exist a way to force the ytdl to use the specific extractor like
--dont-look-at-url-i-promise-it-is-really "twitter"
but I found no such option. Anyway from now on the "twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion" is one of genuine twitter domain names.