Closed milesrichardson closed 7 years ago
hi @milesrichardson , got here through tor.SE. Cheers for this pull.
Loop added to main execution of start.rb to reset circuit of each tor instance via control port using newnym.sh script
Not knowing much about Tor, does this reset a particular tor endpoint (i.e.: one of the 100) and find a new endpoint/ip automatically? When looking at the Haproxy stats I'm often seeing ~5 endpoints that are considerably slower than the rest. Is this feature used to throw them out and get something fresh?
@gebrits Each tor instance listens on a "control port" and has a set of commands you can send over that port (read more here: https://www.thesprawl.org/research/tor-control-protocol/)
One of the commands is "newnym" which is simply a signal teling tor to "reset circuit"
start.rb is the script that is the "main" script of the Docker container, i.e. running in the foreground... First it sets up all the services, then it enters the main loop, which you can see here: https://github.com/mattes/rotating-proxy/pull/14/commits/8594519f51da7da2dc6baafaf244ee619a8c42cc#diff-27035d9712ff5cc90fca9ecb5c34c3f7R266
The main loop loops through EVERY tor proxy (proxies.each
) and sends the newnym signal to it, then loops through every proxy and restarts it if it's not responding, then sleeps 60 seconds.
This is super janky, but it's works. I get pretty good variety running https://github.com/milesrichardson/rotating-proxy . Over the course of an hour, I was able to get ~1000 different exit IP's.
Cheers. Much appreciated.
Thanks for the PR! Also loved reading your post here: http://tor.stackexchange.com/questions/9934/tor-is-only-assigning-circuits-from-a-very-limited-subset-of-exit-nodes
In my testing, I was getting a very limited subset of IP addresses from Tor. This pull request fixes that issue.
Major improvements:
start.rb
to reset circuit of each tor instance via control port usingnewnym.sh
scriptuncachable
file including common IP checking websites so you can see IP is actually changingProxy.working?
includes a timeout so the whole script does not hang for 60 seconds (default timeout)Changes you might not want: