mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.7k stars 953 forks source link

[Twitter] Why doesn't it download this retweet? #1555

Closed nisehime closed 3 years ago

nisehime commented 3 years ago

https://twitter.com/morino_ya/status/1392763691599237121 (NSFW)

gallery-dl.exe -v https://twitter.com/morino_ya/status/1392763691599237121
[gallery-dl][debug] Version 1.17.4
[gallery-dl][debug] Python 3.7.9 - Windows-8.1-6.3.9600-SP0
[gallery-dl][debug] requests 2.25.1 - urllib3 1.25.11
[gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/morino_ya/status/1392763691599237121'
[twitter][debug] Using TwitterTweetExtractor for 'https://twitter.com/morino_ya/status/1392763691599237121'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): twitter.com:443
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/2/timeline/conversation/1392763691599237121.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&send_error_codes=true&simple_quoted_tweet=true&count=100&ext=mediaStats%2ChighlightedLabel HTTP/1.1" 200 3825

It's just empty. The direct link to the retweeted tweet works fine, but when downloading it from other user's timeline, it is ignored.

mikf commented 3 years ago

Because Twitter is very inconsistent with its results. The data for tweet 1392763691599237121 does not contain any media entries (*), even though it should and retweets usually do. It works with "retweets": "original", though.

(*) There should be an `extended_entities` entry here, but there isn't. ``` json "1392763691599237121": { "created_at": "Thu May 13 08:48:35 +0000 2021", "id_str": "1392763691599237121", "full_text": "RT @marunika: 【宣伝】新作読切です!一人暮らしを始めた男の娘の性欲が暴走していく主人公視点の漫画です。ファンザさんはモザイク、他白ヌキです。46頁500円です。\nDLsite→https://t.co/4pi74gWvF9\nFANZA→https://t.co/y…", "display_text_range": [ 0, 140 ], "entities": { "user_mentions": [ { "screen_name": "marunika", "name": "かにまる🥷", "id_str": "129260683", "indices": [ 3, 12 ] } ], "urls": [ { "url": "https://t.co/4pi74gWvF9", "expanded_url": "https://dlsite.jp/mawot/RJ327260/?utm_content=RJ327260", "display_url": "dlsite.jp/mawot/RJ327260…", "indices": [ 95, 118 ] } ] }, "source": "Twitter Web App", "user_id_str": "1321392586066595842", "retweeted_status_id_str": "1392756582925049867", "retweet_count": 15, "favorite_count": 0, "reply_count": 0, "quote_count": 0, "conversation_id_str": "1392763691599237121", "possibly_sensitive_editable": true, "lang": "ja" } ```
nisehime commented 3 years ago

I see. Well, twMediaDownloader doesn't have this issue. So far as I can tell it uses a bit different API urls and the responses contain those media links for the tweet.

nisehime commented 3 years ago

After setting "retweets": "original" I see that request url hasn't changed, nor there's additional requests. Does it mean it gets the metadata from the same response?

mikf commented 3 years ago

twMediaDownloader uses the official Twitter API, gallery-dl only uses the "site-internal" API (what your browser uses while on Twitter). The official API needs a consumer_key and consumer_secret, and I wouldn't want to publish the credentials associated with my Twitter account. It's fine for sites like DeviantArt, but maybe not Twitter. If I do implement support for the official API, I'd want to at least wait until they're done with API v2. Also https://github.com/mikf/gallery-dl/issues/980.

Does it mean it gets the metadata from the same response?

Yep, Twitter returns both Retweet and original Tweet (as well as potential replies). "retweets": true uses the Retweet entry, "retweets": "original" the original Tweet. You could argue that "original" should be the default, but backwards compatibility (i.e. someone would complain if anything changed)

nisehime commented 3 years ago

twMediaDownloader uses the official Twitter API

Wasn't it using it only to download videos? Unless it has changed recently.

You could argue that "original" should be the default

Not really, setting it to "original" will make saving retweets to the user's folder (the one who's retweeting) impossible, won't it? Can't the program just check the Tweet entry for media files if it hasn't found them in the Retweet entry?

mikf commented 3 years ago

Wasn't it using it only to download videos? Unless it has changed recently.

It seems it is using the official API for regular timelines and https://api.twitter.com/2/timeline/media/ (same as gallery-dl) for media timelines.