mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
12.01k stars 978 forks source link

[twitter] search now requires login #3942

Open ClosedPort22 opened 1 year ago

ClosedPort22 commented 1 year ago

It seems that the search endpoint now returns 403 when not logged in.

Related issues: https://github.com/JustAnotherArchivist/snscrape/issues/846 https://github.com/trevorhobenshield/twitter-api-client/issues/22

Edit: Nitter has switched to the newer GraphQL-based search API, which doesn't seem to be login-gated yet: https://github.com/zedeus/nitter/commit/1ac389e7c75ced1762aa9c698ca2899cde668424

Twi-Hard commented 1 year ago

I'm surprised it's not been mentioned yet.. It's been working again for a couple days now. I don't have any form of authentication.

❯ gdl --version
1.25.3-dev
ClosedPort22 commented 1 year ago

It still shows 403 Forbidden for me.

Twi-Hard commented 1 year ago

I tried running it in docker with a vpn in 25 different cities and it worked fine there too. I'm not sure why it's only working for me. The syndication API 404s but I assume that was already brought up. If any logs or anything could help, let me know

Edit: I didn't use a vpn when not using docker

rosswillett commented 1 year ago

I'm also running within docker and regardless of vpns it 403's

@Twi-Hard can you attempt a connection using this command and/or provide back a similar command that I could test against?

gallery-dl -o videos=true https://twitter.com/search?q=from%3Aelonmusk%20since%3A2023-04-10&src=typed_query&f=video

Here's my output:

root@c7a8ce023c2b:/app# gallery-dl -o videos=true https://twitter.com/search?q=from%3Aelonmusk%20since%3A2023-04-10&src=typed_query&f=video
[1] 23987
[2] 23988
root@c7a8ce023c2b:/app# [twitter][error] 403 Forbidden (Forbidden.)

adding --verbose further explains the issue, showing a first request to graphql comes back 200 but the follow-up request to adaptive.json 403's

Twi-Hard commented 1 year ago

Oh, I completely missed this is only for searches.. I only download users and didn't realize it didn't do the search part. I'm sorry about this

mikf commented 1 year ago

Edit: Nitter has switched to the newer GraphQL-based search API, which doesn't seem to be login-gated yet: https://github.com/zedeus/nitter/commit/1ac389e7c75ced1762aa9c698ca2899cde668424

I tried implementing this, but it does not work anymore as is also being reported on Nitter's issue tracker. Here's the patch in case someone wants it:

patch ``` patch diff --git a/gallery_dl/extractor/twitter.py b/gallery_dl/extractor/twitter.py index 5e68f138..0eb126f3 100644 --- a/gallery_dl/extractor/twitter.py +++ b/gallery_dl/extractor/twitter.py @@ -1053,6 +1053,8 @@ class TwitterAPI(): cookies.set("ct0", csrf_token, domain=cookiedomain) auth_token = cookies.get("auth_token", domain=cookiedomain) + if not auth_token: + self.search_adaptive = self.search_graphql self.headers = { "Accept": "*/*", @@ -1265,6 +1267,18 @@ class TwitterAPI(): params["spelling_corrections"] = "1" return self._pagination_legacy(endpoint, params) + def search_graphql(self, query): + endpoint = "/graphql/gkjsKepM6gl_HmFWoWKfgg/SearchTimeline" + variables = { + "rawQuery": query, + "count": 20, + "product": "Latest", + "withDownvotePerspective": False, + "withReactionsMetadata": False, + "withReactionsPerspective": False + } + return self._pagination_tweets(endpoint, variables) + def live_event_timeline(self, event_id): endpoint = "/2/live_event/timeline/{}.json".format(event_id) params = self.params.copy() ```
github-userx commented 1 year ago

Edit: Nitter has switched to the newer GraphQL-based search API, which doesn't seem to be login-gated yet: zedeus/nitter@1ac389e

I tried implementing this, but it does not work anymore as is also being reported on Nitter's issue tracker. Here's the patch in case someone wants it:

patch

Thanks.

looks like not only Twitter is going downhill Wehen it comes to open (API) access and scraping. Soon we probably won’t be able to download reddit content anymore..

https://old.reddit.com/r/reddit/comments/12qwagm/an_update_regarding_reddits_api/

https://old.reddit.com/r/apolloapp/comments/12ram0f/had_a_few_calls_with_reddit_today_about_the/

https://old.reddit.com/r/redditsync/comments/12qwwjh/an_update_regarding_reddits_api_changes_to_how/

mikf commented 1 year ago

Searching without login should work again with https://github.com/mikf/gallery-dl/commit/54cf1fa3e75a3836097f2752b164cc49eb353a6f. Let's see how long it will last.