vladkens / twscrape

2024! X / Twitter API scrapper with authorization support. Allows you to scrape search results, User's profiles (followers/following), Tweets (favoriters/retweeters) and more.
https://pypi.org/project/twscrape/
MIT License
1.12k stars 133 forks source link

Ability to manual toggle between accounts in .db #138

Open Rouge-Trader opened 8 months ago

Rouge-Trader commented 8 months ago

I could be missing something, but don't believe there is the ability to chose which account sends a certain request. For my specific functionality I want one of my accounts to read my timeline while a different account is checking HashTags. Seems like it could be a useful feature to add, I'd be happy to contribute if i could get some pointers. Thanks

davinkevin commented 8 months ago

+1 with also round-robin to prevent rate limiting

BonifacioCalindoro commented 8 months ago

That feature is curcial for debugging accounts and trying to trace if there is a specific limit set in one of them (yes, there are some limits that twscrape is not aware of yet, for example trying to fetch tweet_details from a tweet_id of a shadowbanned account; some accounts fetch the info, some others fetch None, and i believe it's on twitter servers' side so we need a way to debug which account was used each time)

vladkens commented 7 months ago

Hi. Here a function get_for_queue in AccountPoll, so you can control which account you want to use, but you need to write SQL query for this.

Example:

import asyncio

from twscrape import API, AccountsPool

class MyPool(AccountsPool):
    def get_for_queue(self, queue: str):
        # for search timeline always use acc1
        if queue == "SearchTimeline":
            return self._get_and_lock(queue, "acc1")

        # for retweeters use acc2 or acc3
        if queue == "Retweeters":
            qs = "SELECT username FROM accounts WHERE username IN ('acc2', 'acc3') ORDER BY RANDOM() LIMIT 1"
            return self._get_and_lock(queue, qs)

        # for all other queries use the default method
        return super().get_for_queue(queue)

async def main():
    poll = MyPool()
    api = API(poll)

    async for tw in api.search("foo", limit=10):
        print(tw)

if __name__ == "__main__":
    asyncio.run(main())

@davinkevin For simple "round robin" possible to use random accounts order (no custom poll requried):

api.poll._order_by = "RANDOM()"

Possible queue names here: https://github.com/vladkens/twscrape/blob/main/twscrape/api.py#L11

davinkevin commented 7 months ago

Thank you for the solution, it works like a charm!

image