Closed zxj0302 closed 1 week ago
Hey,
You have to instantiate a new twitter session if the rate limit exceeds. I mean of course you can wait for the rate limits to be renewed but if you re-instantiate the session, it will bypass the limits. Take a look at this one : https://github.com/iSarabjitDhiman/TweeterPy/issues/20#issuecomment-1712035023
Let me know if it helps.
Thank you! I use twitter=Tweeterpy() to re-instantiate frequently, still got the error in the picture sometimes. However, the frequency is acceptable for me.
Hmm strange. I will take a look at the code shortly and see if there is any bug.
Oh btw I think datacenter proxies work too, not sure but you can try.
Hey @zxj0302 Hopefully its fixed in this commit f2e2535
I forgot to remove _DEFAULT_SESSION from config.py module, it was such a silly mistake. But never mind, its fixed now. Please let me know once u have tested it.
Thanks
Edit: Make sure to update the package before testing.
Hey @zxj0302 Hopefully its fixed in this commit f2e2535
I forgot to remove _DEFAULT_SESSION from config.py module, it was such a silly mistake. But never mind, its fixed now. Please let me know once u have tested it.
Thanks
Edit: Make sure to update the package before testing.
Good morning! Thank you for your work! I tested on my code, and still got a similar error:
BTW, I have some questions that confuses me a lot:
'tweet = scraper.get_tweet(tweet_id)' len(tweet['data']['tweetResult'])
Thank you!!!❤️❤️❤️
Hey,
Oh wait, I just checked the first picture you attached, it says SSL error. Btw where did u get the proxies from?
I will fix it all shortly, please keep me posted.
Thanks.
Thank you for your reply! For Q2, you mean that I encountered rate limit? Actually I re-initiate a instance of TweeterPy before get the rate limits(50 requests in 15 mins). That's why I got confused that I get unexpected return(no 'result' in 'tweetResult') occasionally or frequently when scrape in bulk. For proxy, I am using dynamic residential proxy provided by smartproxy or Roxlabs. It is cheap compared with others :smile: Thank you again! SALUTE! :saluting_face:
Hey, Can u test now? and please attach the screenshot of the error.
@iSarabjitDhiman Yes and testing, no SSL error/couldn't get guest token error occurred till now. However, still got this error
Thank you for your reply! For Q2, you mean that I encountered rate limit? Actually I re-initiate a instance of TweeterPy before get the rate limits(50 requests in 15 mins). That's why I got confused that I get unexpected return(no 'result' in 'tweetResult') occasionally or frequently when scrape in bulk. For proxy, I am using dynamic residential proxy provided by smartproxy or Roxlabs. It is cheap compared with others 😄 Thank you again! SALUTE! 🫡
Could u share the screenshot? BTW, I added a debug message in there https://github.com/iSarabjitDhiman/TweeterPy/blob/d6fd64ce509104aeb53e0d9ccd4c20b340a08022/tweeterpy/tweeterpy.py#L167
Please check your log file.
You mean the screenshot of log file or what?
Yes the log file screenshot should be fine, make sure to blur the sensitive data if there is any.
tweeterpy.log The log file attached.
Hmmm I don't see the guest token error in there. Looks clean to me. Is this the correct log file?
If the last numbers in each row is the number of bytes received, then 52 bytes contains nothing, only {'data':{'tweetResult':}} I think.
So the guest token error is gone?
Hmmm I don't see the guest token error in there. Looks clean to me. Is this the correct log file?
You want the log that shows the 'guest token error'? or the log for now to see whether it is fixed? The file I sent is the log for now, and haven't have that error util now
The guest token error logs. I fixed the UnboundLocalError here #63 But I am looking for the "Couldn't find guest token" error you got initially.
The log was overwritten😅 I will send the lof to you if I get the same error.
This is what I scraped, I set the 'tweet_type' to 'deleted' if there is only {'data':{'tweetResult':}} and no 'result' under 'tweetResult'. However, the fact is that most of them are not 'deleted' and have content in deed. I wonder why it happened.
I checked the log and no rate limit reached. Because I re-initiate Tweeterpy() for each 40 requests, the limitation is 50.
Oh I see, I just tested it on my side. The reason it return an empty dataset is that there is no tweet for the given tweet ID. Take this tweet id for an instance : 1327268169774374913 If I use get_tweet() method It returns this {'data': {'tweetResult': {}}} this.
But if I first log in with login() method and then try to fetch the same tweet ID, it would give us some extra useful information.
In my case it throws this error:
Exception: _Missing: No status found with that ID.
Seems like, twitter doesn't give guest users any meaningful error messages. But if u are logged in, you get those messages/warnings/errors.
Let me know if there is still any confusion.
Yes I know that may happen. But only small number of them are really cannot be accessed or deleted. For example, row 1968, I can get the tweet without login as the picture shows. However, I still got an empty dataset. I checked many ids manually and lots of them have content and can be accessed in deed but still got empty. However, I run get_retweet() one time to get that tweet using id, I will get the expected non-empty return. So confusing.
Oh, the 'couldn't got guest token error' occurs again. Detail in the end of the log file attached. tweeterpy.log
You haven't updated the package, this is the reason you are still getting that guest token error. About the no tweets data part, please log in and try to fetch that particular tweet you attached above and you will get the error message. Most probably the error is : Exception: _Missing: No status found with that ID. Let me know how it goes.
Hi!
And still in log file found error like this:
The log file attached:
You can search '[ERROR]' at the end of the log.
I believe the SSL error is due to the proxies. Do you get this error without proxies? I will look into the "empty response" part soon.
Thank you for your reply! Haven't slept bro? I is 5:41 am now :timer_clock: (may be 3:41 am in you zone) Hard to say whether the SSL error will occurs without proxy, I cannot scrape in bulk without proxy. So it may not be easy to re-produce without proxy. Yes, the empty response part is more important. After scraping 1000 tweets, almost all tweets are empty. And for the first 1000, that empty response may happens occasionally.
Hey man,
Please test tweeterpy-1.1.5-py3-none-any.zip , this should solve your problem. Extract the zip file and make sure the file is in your current working directory
pip install tweeterpy-1.1.5-py3-none-any.whl
# After u install, don't use the config file. Just pass the proxy directly into the constructor while creating an instance
from tweeterpy import TweeterPy
twitter = TweeterPy(proxy={"http":proxy_here, "https":proxy_here})
Oh yeah and don't worry, I sleep late night. I am a full time freelancer so I am kind of used to it.
Let me know how it goes.
Hi bro, Thank you! :heart: Tested. It is amazing that the empty rate is much lower although still some non-empty tweets got empty result sometimes. It is magic!! and I am so wondering how you find the reasons and what did you do to improve this?! As you may want, the log file and tweets scraped(for you to check the correctness) attached. version1.1.5-1.zip
Hey @zxj0302 Hopefully its fixed in this commit f2e2535
I forgot to remove _DEFAULT_SESSION from config.py module, it was such a silly mistake. But never mind, its fixed now. Please let me know once u have tested it.
Thanks
Edit: Make sure to update the package before testing.
Hey @zxj0302 So here is what happened , Remember that I was using a default session initially for some reason. In fact I barely touched the config.py module ever since I am working on this project. I made this tool with "one user at a time" prospective and configured some global settings in config.py module. But now people are using it for different purposes and with multi-threading obviously. So the global setting gets overwritten every time a new instance is created. In your case, all the instances were using the most recently initialized instance's settings i.e. the proxy. Well Now I have work to do and make the tool work with concurrency (multi-threading)
Thanks for the update btw.
I will release a new build soon.
Thank you for your reply and great work! Hoping to see the new version with no 'empty result' for any non-empty tweets and the parallel version. SALUTE! :saluting_face:
Fixed in e475813228e1a26a5373d9af57cb9a25207a7bfc
I generate a new session with 'twitter.generate_session()' after each 50 calls of get_retweet() to avoid the error 'rate limit exceed'. However, the error in the picture above will occurs sometimes, like after 200/300 calls of get_retweet(). It stops me from scraping a large number of tweets at a time. I was wondering how can I fix this? Thanks a lot!
P.S.: I used residential proxy with different ip for each request.