mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
10.68k stars 881 forks source link

[Twitter] Search page returns 404 even for logged in user #4264

Closed TheSandBoxMKG closed 11 months ago

TheSandBoxMKG commented 12 months ago

Even though downloading from the media tab still works normally Screenshot_20230703-095500 Screenshot_20230703-095521

flanpowered commented 12 months ago

Same error here, although the URL I use is not a search, it's a profile page. I've changed the config.json file a couple of times and discovered it only happens when extractor.twitter.retweets is set to true

Edit: Nevermind, it happened with another tweet even with extractor.twitter.retweets set to false :(

ClosedPort22 commented 12 months ago

-o search-endpoint=graphql works. They've switched to the GraphQL endpoint on the web frontend.

flanpowered commented 12 months ago

That seems to work for me, thanks!

MrJmpl3 commented 12 months ago

Now, don't work because Twitter apply the rate limit u.u

TheSandBoxMKG commented 12 months ago

Now, don't work because Twitter apply the rate limit u.u

Same here, instantly rate-limited for using gallery-dl, waiting for the Termux process to resume the next day.

That also means that I cannot use the official Twitter Web client until the Termux process finishes.

TheSandBoxMKG commented 12 months ago

I can say that using range still consumes the amounts of tweets you can view. Screenshot_20230704-130538-664 Screenshot_20230704-130658

buratsy commented 11 months ago

i'm getting a 404 error today even with the /media link. my last download was 2 days ago and it was working fine, i only downloaded /media links then, though

ClosedPort22 commented 11 months ago

i'm getting a 404 error today even with the /media link. my last download was 2 days ago and it was working fine, i only downloaded /media links then, though

Could you post the verbose log?

buratsy commented 11 months ago

i'm getting a 404 error today even with the /media link. my last download was 2 days ago and it was working fine, i only downloaded /media links then, though

Could you post the verbose log?

here. i think the issue might be related to "Requesting guest token" but i have a login in my config. do you need to put cookies now? [gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/home' [twitter][debug] Using TwitterTimelineExtractor for 'https://twitter.com/home' [twitter][info] Requesting guest token [urllib3.connectionpool][debug] Starting new HTTPS connection (1): api.twitter.com:443 [urllib3.connectionpool][debug] https://api.twitter.com:443 "POST /1.1/guest/activate.json HTTP/1.1" 200 63 [urllib3.connectionpool][debug] Starting new HTTPS connection (1): twitter.com:443 [urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/XA6F1nJELYg65hxOC2Ekmg/UserByScreenName?variables=%7B%22screen_name%22%3A%22home%22%2C%22withSafetyModeUserFields%22%3Atrue%7D&features=%7B%22hidden_profile_likes_enabled%22%3Afalse%2C%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22subscriptions_verification_info_verified_since_enabled%22%3Atrue%2C%22highlights_tweets_tab_ui_enabled%22%3Atrue%2C%22creator_subscriptions_tweet_preview_api_enabled%22%3Atrue%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%7D HTTP/1.1" 404 0 [twitter][error] 404 Not Found ()

mikf commented 11 months ago

This particular issue is fixed with commit https://github.com/mikf/gallery-dl/commit/f86fdf64a64091e10176bfcd87fd07db635e7b93


[twitter][info] Requesting guest token
[twitter][error] 404 Not Found ()

You are not logged in when you get these messages and errors.

buratsy commented 11 months ago

This particular issue is fixed with commit f86fdf6

will this be an automatic update or do i have to manually implement this into my gallery-dl? if it's the latter, how do i do it?

mikf commented 11 months ago

will this be an automatic update or do i have to manually implement this into my gallery-dl?

The next release will contain the change from this commit. Until then, use the fix from https://github.com/mikf/gallery-dl/issues/4264#issuecomment-1618363443.

buratsy commented 11 months ago

will this be an automatic update or do i have to manually implement this into my gallery-dl?

The next release will contain the change from this commit. Until then, use the fix from #4264 (comment).

i tried that out but it still gives me the same errors. i'll just wait for the next release.

as a side note, how many tweets can be scraped from /media/? i read somewhere that it's 1000 tweets max from a single page and i was wondering if it's true.

a84r7a3rga76fg commented 11 months ago

@buratsy No, it varies. I was able to download yesterday about 2500 pictures from one profile and about 3000 from another. I've never been capped at 1000 pictures. I'm using a free account.

Twi-Hard commented 11 months ago

The limit is about 3200. It's the same for the official api

buratsy commented 11 months ago

2500 pictures from one profile and about 3000 from another. I've never been capped at 1000 pictures

each tweet can hold 4 images at once so it's not exactly a 1:1 ratio between tweets and pics but admittedly, i never even got close to that many pics from only scraping /media/. at some point, i have to switch to the search function. usually when the number of "Photos & videos" listed on the /media/ page is 1000+.

The limit is about 3200. It's the same for the official api

if that's so, then it's pretty weird that it stops scraping for me way earlier than that

anonymous721 commented 11 months ago

-o search-endpoint=graphql works. They've switched to the GraphQL endpoint on the web frontend.

Maybe a dumb question, but if I'm running gallery-dl through a Python script (so something like job.DownloadJob(URL).run()) what would be the equivalent?

mikf commented 11 months ago

I'm running gallery-dl through a Python script (so something like job.DownloadJob(URL).run()) what would be the equivalent?

from gallery_dl import config
config.set(("extractor", "twitter"), "search-endpoint", "graphql")