the-convocation / twitter-scraper

A port of n0madic/twitter-scraper to Node.js.
https://the-convocation.github.io/twitter-scraper/
MIT License
195 stars 42 forks source link

Let the user choose how to handle rate limits #81

Open catdevnull opened 7 months ago

catdevnull commented 7 months ago

Hi! I've been liking this library a lot as you can see :)

Something that I would like to have for my project is being able to choose how rate limits are handled. Specifically, I want to implement a behavior similar to the one in twscrape where if an account has a rate limit, it re-tries with another account that doesn't. However, because of how twitter-scraper currently works it's not possible to choose, and it just waits until the rate limit is reset which can take a long time.

I'm probably going to do the easiest patch for my fork which is going to be throwing a special error (something like RateLimitError) from requestApi to be able to handle it from my code, but I would like to find a solution that can be applied upstream.

Thanks <3

karashiiro commented 7 months ago

Definitely open to this in general, though I'm a little less sure about how to actually go about it (still need to think it over).

I'm leaning towards making it part of the TwitterAuth interface and generalizing it a bit more, since that's already responsible for providing the fetch implementation, along with request/response transforms. From an implementation standpoint, it could just be a wrapper around the fetch implementation itself. On the other hand, there's not a clear way of communicating that this is where the functionality would live, so it could use a bit more of an API around it, maybe.

At any rate, the fetch provider API could actually work today for your use case, so that might be worth a try in the meantime 👀

catdevnull commented 6 months ago

Okay, I've implemented a PoC of account login as a request interceptor in this branch: https://github.com/catdevnull/milei-twitter/blob/scraper-resiliente-cuentas-interceptar/scraper-manzana/scraper.ts

It's a bit hacky and it doesn't actually switch accounts on errors right now, but it could be easily implemented (EDIT: now implemented :). I'll probably implement this in my staging environment soon.

It would probably be cleaner to use if we could access the cookie jar from outside the library

karashiiro commented 6 months ago

(ignore) referencing #84 for discoverability