the-convocation / twitter-scraper

A port of n0madic/twitter-scraper to Node.js.
https://the-convocation.github.io/twitter-scraper/
MIT License
178 stars 38 forks source link

Twitter API throttling requests (`429 Too Many Requests`) #11

Open ToniPortal opened 1 year ago

ToniPortal commented 1 year ago

Hello, when I use certain scraper functions it gives me this error

      status: 429,
      statusText: 'Too Many Requests',

It was mostly to ask do you know when it's going to stop saying that ? Or will it be fixed ?

karashiiro commented 1 year ago

I haven't seen that error from this before, but I'm currently working on porting over a month's worth of commits (#9, #6), so once that's done this might be fixed.

karashiiro commented 1 year ago

I just released v0.3.0 which ports over the most critical changes, let me know if that helps.

karashiiro commented 1 year ago

Actually, I just noticed the repo tests are failing with 429, looks like a regular rate limit. I'll need to set up a backoff for this.

karashiiro commented 1 year ago

Should be fixed with v0.3.1 (#12).

karashiiro commented 1 year ago

Made a minor adjustment in v0.3.2 btw, v0.3.1's backoff can get stuck in a retry loop for a while sometimes, which the latest version fixes.

karashiiro commented 1 year ago

Apparently that wasn't what was happening - the API actually just rate-limits the client for 14 minutes after a certain point. I updated the throttling mechanism to handle this, but I'm not sure if there's any real way to handle this beyond the delay.

ToniPortal commented 1 year ago

Oh thanks for the clarification! Really not cool to have put so many limitations on twitter, hope one day they change all these limitations they put in place...

ImTheDeveloper commented 1 year ago

Is the restriction on IP address or just the tokens being used? If its the token can you regenerate the auth and carry on? Given we approximately know the limits of each endpoint (50 requests per 15 minutes on tweets/replies) we could count the usage of a token and generate before. I see in the code already it takes into account the token validity over x time so this would be now based on usage too.

Edit: Specifically in regards to guest token usage. I expect using a fully authed account would limit the consumer.

karashiiro commented 1 year ago

I just gave that a try, and it doesn't work - as soon as you try to get a new guest token you get rate-limited.

diff for reference: diff.txt

ethos-vitalii commented 4 days ago

I added a request timeout, but when it got relate-limited, it hung for a long time and ignored the timeout. This is how I added a timeout:

const scraper = new Scraper({
  transform: {
    request(input: RequestInfo | URL, init: RequestInit = {}) {
      init.signal = AbortSignal.timeout(REQUEST_TIMEOUT);

      return [input, init];
    },
  },
});

I also noticed that there are two types of 429. One of them hangs for ~13-14 minutes and another throws immediately.

  1. I think, the first 429 is related to the endpoint I'm hitting (getProfile), this one is being retried.
  2. Another one is for requesting a new guest token, this one fails immediately. I've noticed that if I create a new instance of Scraper, it fixes the rate-limit requesting a profile. So if we request a new guest token whenever we get 429, that should fix the rate limit. But if requesting the guest token is rate-limited, then it won't help for 14 minutes.

Would it be possible to refresh the guest token when the request for profile (or probably any other request) gets rate-limited instead of waiting for 13-14 minutes?