mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.7k stars 953 forks source link

Gallery-dl missed an image while scraping a Twitter account, curious to know how to fix this for future attempts #4974

Open rarelygoeshere opened 10 months ago

rarelygoeshere commented 10 months ago

Hello there, while I was checking into the content of this account, I noticed that it was missing an image.

I found that for whatever reasons, it didn't scraped this tweet from their account, despite it clearly being present https://twitter.com/mamezurushiki/status/328853786019917825 I even checked the output, which I placed below, and search the numbers (328853786019917825), but nothing can be found, indicating it didn't failed but never even scraped it in the first place.

I don't know if this is because of Twitter's recent update or what, but I hope you can fix this so I can be reassured that my gallery-dl is scraping all, or as much Twitter's content as it is capable of. Thank you.

mamezurushiki gallery-dl output.txt

jadedgnome commented 10 months ago

it downloads it directly.

[gallery-dl][debug] Version 1.26.4
[gallery-dl][debug] Python 3.8.3 - Windows-10-10.0.17763-SP0
[gallery-dl][debug] requests 2.31.0 - urllib3 2.1.0
[gallery-dl][debug] Configuration Files []
[gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/mamezurushiki/status/328853786019917825'
[twitter][debug] Using TwitterTweetExtractor for 'https://twitter.com/mamezurushiki/status/328853786019917825'
[twitter][info] Requesting guest token
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): api.twitter.com:443
[urllib3.connectionpool][debug] https://api.twitter.com:443 "POST /1.1/guest/activate.json HTTP/1.1" 200 63
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): twitter.com:443
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/2ICDjqPd81tulZcYrtpTuQ/TweetResultByRestId?variables=%7B%22tweetId%22%3A%22328853786019917825%22%2C%22withCommunity%22%3Afalse%2C%22includePromotedContent%22%3Afalse%2C%22withVoice%22%3Afalse%7D&features=%7B%22creator_subscriptions_tweet_preview_api_enabled%22%3Atrue%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22responsive_web_twitter_article_tweet_consumption_enabled%22%3Afalse%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Atrue%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Atrue%2C%22longform_notetweets_rich_text_read_enabled%22%3Atrue%2C%22longform_notetweets_inline_media_enabled%22%3Atrue%2C%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22responsive_web_media_download_video_enabled%22%3Afalse%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%7D&fieldToggles=%7B%22withArticleRichContentState%22%3Afalse%7D HTTP/1.1" 200 1573
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): pbs.twimg.com:443
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/BJBSxqtCcAM668X?format=jpg&name=orig HTTP/1.1" 200 47835
* .\gallery-dl\twitter\mamezurushiki\328853786019917825_1.jpg

try adding "/media" to the username url to see if it skips it again.

mikf commented 10 months ago

Tweet 328853786019917825 is very old - over 10 years - so the only way to get it from just the username (that I know of and what gallery-dl uses) is via search, and search results are unreliable and incomplete.

rarelygoeshere commented 10 months ago

Tweet 328853786019917825 is very old - over 10 years - so the only way to get it from just the username (that I know of and what gallery-dl uses) is via search, and search results are unreliable and incomplete.

Hmmmm I see. So does that mean it'll be hard pressed for gallery-dl to scrape that tweet and similar tweets, even if I try adding "/media" to the username url like the comment above suggests? So I guess that means this problem is probably not solvable regarding gallery-dl?

Edit: Does the recent Twitter update impact my usage of gallery-dl in any capacity? Do I need to change my twitter config to deal with it?

Twi-Hard commented 10 months ago

In my experience, the search results seem to change over time. Twitter has this to say about what appears in search:

Do your posts contribute to the conversation in a meaningful way? We strive to show the most relevant, credible, and safe content in search.

They don't allow NSFW tweets to be in it either.

https://help.twitter.com/en/using-x/x-search-not-working

Edit: I have no idea why my post duplicated when I edited it. I was using the mobile app so I couldn't preview the markdown

mikf commented 10 months ago

They don't allow NSFW tweets to be in it either.

There is an option to show/hide "sensitive" search results. (under "Settings" -> "Privacy and safety" -> "Content you see" -> "Search settings") _

rarelygoeshere commented 10 months ago

Tweet 328853786019917825 is very old - over 10 years - so the only way to get it from just the username (that I know of and what gallery-dl uses) is via search, and search results are unreliable and incomplete.

Hmmmm I see. So does that mean it'll be hard pressed for gallery-dl to scrape that tweet and similar tweets, even if I try adding "/media" to the username url like the comment above suggests? So I guess that means this problem is probably not solvable regarding gallery-dl?

Edit: Does the recent Twitter update impact my usage of gallery-dl in any capacity? Do I need to change my twitter config to deal with it?

Sorry to bother folks, but would anyone mind answering my inquiry? Im curious to know if Twitter's new update neccesiates changing my config to make sure gallery-dl scrape as much as possible.

Fukitsu commented 9 months ago

I think it depends on what you include to download. For example, I'm re-scraping some profiles to test and it downloaded new pictures using "include": ["timeline", "media", "replies"], in my config file. Though I'm still not very sure about the timeline.strategy

rarelygoeshere commented 9 months ago

I think it depends on what you include to download. For example, I'm re-scraping some profiles to test and it downloaded new pictures using "include": ["timeline", "media", "replies"], in my config file. Though I'm still not very sure about the timeline.strategy

Ok, well here's my config for Twitter. Im quite certain there should be nothing amiss about it and it should be able to download all the media tweets of a profile.


"twitter":
        {
            "username": "null",
            "password": "null",
            "filename": "{author[name]}-{author[id]}({author[date]:%Y%m%d_%H%M%S})-{tweet_id}({date:%Y%m%d_%H%M%S})-{num}.{extension}",
            "cards": false,
            "conversations": false,
            "pinned": false,
            "quoted": false,
            "replies": true,
            "retweets": false,
            "strategy": null,
            "text-tweets": false,
            "twitpic": false,
            "unique": true,
            "users": "timeline",
            "videos": true```
jadedgnome commented 9 months ago

I think it depends on what you include to download. For example, I'm re-scraping some profiles to test and it downloaded new pictures using "include": ["timeline", "media", "replies"], in my config file. Though I'm still not very sure about the timeline.strategy

Ok, well here's my config for Twitter. Im quite certain there should be nothing amiss about it and it should be able to download all the media tweets of a profile.

"twitter":
        {
            "username": "null",
            "password": "null",
          "filename": "{author[name]}-{author[id]}({author[date]:%Y%m%d_%H%M%S})-{tweet_id}({date:%Y%m%d_%H%M%S})-{num}.{extension}",
            "cards": false,
            "conversations": false,
            "pinned": false,
            "quoted": false,
            "replies": true,
            "retweets": false,
            "strategy": null,
            "text-tweets": false,
            "twitpic": false,
            "unique": true,
            "users": "timeline",
            "videos": true```

is there anyway to pass all these to the command line via flags?