iSarabjitDhiman / TweeterPy

TweeterPy is a python library to extract data from Twitter. TweeterPy API lets you scrape data from a user's profile like username, userid, bio, followers/followings list, profile media, tweets, etc.
MIT License
119 stars 17 forks source link

Got code 88, Rate limit exceeded just by twitter = TweeterPy() #53

Open Unayung opened 3 months ago

Unayung commented 3 months ago

hey pal, it's me again.

I've encountered this since last tuesday, when I try to initialize the crawler script which runs perfect for like whole month.

from tweeterpy import TweeterPy
twitter = TweeterPy()

it says

2024-03-19 02:41:31,223 [ERROR] :: Couldn't generate a new session.
'guest_token'
Traceback (most recent call last):
  File "/home/yitinglin/metacrm-twitter-crawler/tweeterpy/tweeterpy.py", line 159, in generate_session
    guest_token = make_request(
KeyError: 'guest_token'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/yitinglin/metacrm-twitter-crawler/tweeterpy/tweeterpy.py", line 29, in __init__
    self.generate_session()
  File "/home/yitinglin/metacrm-twitter-crawler/tweeterpy/tweeterpy.py", line 159, in generate_session
    guest_token = make_request(
KeyError: 'guest_token'

when I dig into the source code and print out the result of make_request to the guest token, I found that I got

{'code': 88, 'message': 'Rate limit exceeded.'}

I'm wondering if you have encountered this issue before, on the guest token request.

iSarabjitDhiman commented 3 months ago

Hmm that's strange. Have you tried doing it from some other machine and with a new IP address. I think the limit is related to the IP address (If you are scraping continuously). Try connecting to some other network or maybe something like a VPN/Proxy.

Let me know how it goes.

Edit: Wait you should not be getting a KeyError, I fixed it in this commit - 9ab4846. Try updating the package.

ilonabehn2 commented 3 months ago

Hello, i am facing a problem with your code when using threads. i tried multiple thread types. tried to put each instance of your Twitter instance in a separated instance but it's not working for me. whith 1 thread it works completely fine but when i am using multi threads even if each instance has it's own proxy rotating every request. it's still throwing 403 forbidden when calling your login function. is there a solution ? here's my code

list_tokens = open('twitter_tokens.txt', 'r', encoding='utf-8').read().splitlines()
profiles_to_scan = open('profiles_to_scan.txt', 'r', encoding='utf-8').read().splitlines()
all_proxies = open('proxies_crack.txt', 'r', encoding='utf-8').read().splitlines()

def getProxy():
    return next(proxy_pool)

class Proxy : 
    def __init__(self,proxy):
        self.user = proxy.split(':')[2]
        self.password = proxy.split(':')[3]
        self.ip = proxy.split(':')[0]
        self.port = proxy.split(':')[1]

proxy_pool = itertools.cycle(all_proxies)  # Create a cycle iterator from the proxy list
accounts_pool = itertools.cycle(list_tokens)

def get_account():
    return next(accounts_pool).strip()

class Profile():

     def __init__(self,profile):
        self.profile = profile

     def get_followers(self):
        proxy_object = Proxy(getProxy())
        proxy_url = "http://" + proxy_object.user + ":" +proxy_object.password + "@" + proxy_object.ip + ":" + proxy_object.port + "/"

        proxies_formatted = {
                    "http":  proxy_url,
                    "https":  proxy_url,
                }

        config.PROXY = proxies_formatted

        twitter = TweeterPy()
        twitter.generate_session(auth_token=get_account())
        self.profile = self.profile.strip()
        has_more = True
        cursor = None
        while has_more:
            try:
                response = None
                response = twitter.get_friends(self.profile,follower=True, end_cursor=cursor,pagination=False)
                with open(self.profile+'.txt', 'a',encoding='utf-8') as save_followers: 
                        for follower in response['data']: 
                            screen_name = follower['content']['itemContent']['user_results']['result']['legacy']['screen_name']
                            save_followers.write(screen_name+'\n')

                has_more = response.get('has_next_page')
                api_rate_limits = response.get('api_rate_limit')
                limit_exhausted = api_rate_limits.get('rate_limit_exhausted')
                if has_more:
                    cursor = response.get('cursor_endpoint')
                ## YOUR CUSTOM CODE HERE (DATA HANDLING, REQUEST DELAYS, SESSION SHUFFLING ETC.)
                ## time.sleep(random.uniform(7,10))
                if limit_exhausted:
                    raise RateLimitError
            except Exception as error:
                print(error)
                twitter.generate_session(auth_token=get_account())
                config.UPDATE_API

def create_and_launch_threads(profile):
    profile_client = Profile(profile)
    profile_client.get_followers()
    return

with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
        # Submit the function to the executor for each number
        futures = [executor.submit(create_and_launch_threads, profile) for profile in profiles_to_scan]
        # Wait for all futures to complete and get their results
        results = [future.result() for future in concurrent.futures.as_completed(futures)]
iSarabjitDhiman commented 3 months ago

Hey @ilonabehn2 Could u please attach the logs? Make sure to strip off any sensitive data. I will take a look at the code in the meantime.

Thanks

iSarabjitDhiman commented 1 month ago

Hey @Unayung @ilonabehn2

You can use this beta build for the time being. I will release a stable version soon.