Closed koszzz closed 5 months ago
@koszzz These limits are in the twitter protocol (depending on the endpoint, a certain amount of data is available for retrieval every 15 times). twscrape automatically switches the account when the current account reaches rate limit.
So you can use more accounts if you need to retrieve bigger amount of data.
If you provide an example query in both libraries, I can check if twitter's behavior has changed.
@vladkens In trevorhobenshield/twitter-api-client, I used Scraper.tweets_and_replies([id]) to retrieve data. In twscrape, I used api.user_tweets_and_replies(user_id) to retrieve data. I want to get the user timeline. Please check. Thank you.
Hi, @koszzz.
Tested both libs with code like this:
import asyncio
import time
from threading import Thread
from twitter.scraper import Scraper
from twscrape import API
from twscrape.logger import set_log_level
from twscrape.models import Tweet, parse_tweets
set_log_level("TRACE")
async def main():
api = API(debug=True)
acc = await api.pool.get_account("XXX")
assert acc is not None
user_id = 2244994945
limit = 5000
def run():
st = time.time()
scraper = Scraper(cookies=acc.cookies)
res = scraper.tweets_and_replies([user_id], limit=limit)
tws: list[Tweet] = []
for x in res:
for tw in parse_tweets(x):
tws.append(tw)
for tw in tws:
print(tw.date, tw.id, tw.user.id)
dt = time.time() - st
print(f">> count 1: {len(tws)} - {dt:.2f}s")
t = Thread(target=run)
t.start()
t.join()
print("-" * 50)
st = time.time()
count = 0
async for tw in api.user_tweets_and_replies(user_id, limit=limit):
print(tw.date, tw.id, tw.user.id)
count += 1
dt = time.time() - st
print(f">> count 2: {count} - {dt:.2f}s")
if __name__ == "__main__":
asyncio.run(main())
Have Cannot parse JSON response 'NoneType' object has no attribute 'json'
error when rate limit appear in twitter-api-client
. twscrape works as expected – changes accounts and wait for api limit reset. I mean, both libraries are restricted by Twitter.
twscrape v0.9 has outdated code to detect banned account and this banned account was marked as rate limited with long period like 4 or 12 hours (depend of case). So lib infinitely says that no accounts available.
I update ban detection code in master. So no longer should be an infinite "No account available for queue" loop.
v0.10 released with new bad detection policy:
pip install --upgrade twscrape
See other changes here: https://github.com/vladkens/twscrape/releases/tag/v0.10.0
You can also check accounts stats with:
twscrape stats
Also if sessions for some accounts expired, its possible to relogin accounts with: https://github.com/vladkens/twscrape?tab=readme-ov-file#re-login-accounts
I found the following problems when using it: 2023-12-31 13:24:31.710 | INFO | twscrape.accounts_pool:get_for_queue_or_wait:260 - No account available for queue "UserTweetsAndReplies". Next available at 13:39:31 I can use trevorhobenshield/twitter-api-client without the 15-minute limit problem. Can this problem be solved?