Open innocentius opened 3 years ago
Currently in the main branch, the Guest Token is always retrieved without a proxy. Such is that our IP is still exposed to Twitter, which could then detect the frequent guest token request when doing large scale scrapping. By creating a proxy for retrieving guest token, we would be able to hide our IP and retrieve guest token in a more stable fashion.
However, if our proxies are AWS IPs, then this might not actually work well. Maybe we should consider implimenting a seperate proxy method for this.
A temporary fix is implimented which only support http proxy.
Scratch that, this fix is not actually working.
A temporal solution is in place, for now. This significantly decrease the speed of scrapping, due to the need of using proxy for token.
When I try running
twint -u username --followers
I get the following. How do I go around this? I simply want to scrape followers
sleeping for 2 secs due to token failure
sleeping for 4 secs due to token failure
sleeping for 6 secs due to token failure
sleeping for 8 secs due to token failure
Sorry friend, this version of twint is not yet ready for public use. If you are already using proxies, make sure to direct your proxy port to 127.0.0.1:24000, that way you can use it properly. If you are not using proxies, I recommend you use the original version, although I did heard the get follower functions there is also failing due to changes on Twitter's side.
During a long duration data retrival session when AWS proxies are being used, guest tokens could expire, leading to a token.refresh . However, since Twitter wouldn't send guest token to AWS IPs, all following requests made after the token expires would fail. There need to be a way to retrive guest token using viable IP (either your own, or another resident proxy route), in order to achieve long term stability.