vladkens / twscrape

2024! X / Twitter API scrapper with authorization support. Allows you to scrape search results, User's profiles (followers/following), Tweets (favoriters/retweeters) and more.
https://pypi.org/project/twscrape/
MIT License
793 stars 104 forks source link

15 minutes' limit #102

Closed koszzz closed 5 months ago

koszzz commented 6 months ago

I found the following problems when using it: 2023-12-31 13:24:31.710 | INFO | twscrape.accounts_pool:get_for_queue_or_wait:260 - No account available for queue "UserTweetsAndReplies". Next available at 13:39:31 I can use trevorhobenshield/twitter-api-client without the 15-minute limit problem. Can this problem be solved?

vladkens commented 6 months ago

@koszzz These limits are in the twitter protocol (depending on the endpoint, a certain amount of data is available for retrieval every 15 times). twscrape automatically switches the account when the current account reaches rate limit.

So you can use more accounts if you need to retrieve bigger amount of data.

If you provide an example query in both libraries, I can check if twitter's behavior has changed.

koszzz commented 6 months ago

@vladkens In trevorhobenshield/twitter-api-client, I used Scraper.tweets_and_replies([id]) to retrieve data. In twscrape, I used api.user_tweets_and_replies(user_id) to retrieve data. I want to get the user timeline. Please check. Thank you.

vladkens commented 5 months ago

Hi, @koszzz.

Tested both libs with code like this:

import asyncio
import time
from threading import Thread

from twitter.scraper import Scraper

from twscrape import API
from twscrape.logger import set_log_level
from twscrape.models import Tweet, parse_tweets

set_log_level("TRACE")

async def main():
    api = API(debug=True)
    acc = await api.pool.get_account("XXX")
    assert acc is not None

    user_id = 2244994945
    limit = 5000

    def run():
        st = time.time()
        scraper = Scraper(cookies=acc.cookies)
        res = scraper.tweets_and_replies([user_id], limit=limit)
        tws: list[Tweet] = []
        for x in res:
            for tw in parse_tweets(x):
                tws.append(tw)

        for tw in tws:
            print(tw.date, tw.id, tw.user.id)

        dt = time.time() - st
        print(f">> count 1: {len(tws)} - {dt:.2f}s")

    t = Thread(target=run)
    t.start()
    t.join()

    print("-" * 50)
    st = time.time()
    count = 0
    async for tw in api.user_tweets_and_replies(user_id, limit=limit):
        print(tw.date, tw.id, tw.user.id)
        count += 1

    dt = time.time() - st
    print(f">> count 2: {count} - {dt:.2f}s")

if __name__ == "__main__":
    asyncio.run(main())

Have Cannot parse JSON response 'NoneType' object has no attribute 'json' error when rate limit appear in twitter-api-client. twscrape works as expected – changes accounts and wait for api limit reset. I mean, both libraries are restricted by Twitter.

twscrape v0.9 has outdated code to detect banned account and this banned account was marked as rate limited with long period like 4 or 12 hours (depend of case). So lib infinitely says that no accounts available.

I update ban detection code in master. So no longer should be an infinite "No account available for queue" loop.

vladkens commented 5 months ago

v0.10 released with new bad detection policy:

pip install --upgrade twscrape

See other changes here: https://github.com/vladkens/twscrape/releases/tag/v0.10.0

You can also check accounts stats with:

twscrape stats

Also if sessions for some accounts expired, its possible to relogin accounts with: https://github.com/vladkens/twscrape?tab=readme-ov-file#re-login-accounts