Open ghost opened 2 years ago
you need to create circuits like this
with tor.create_circuit(3) as circuit:
...
So I can only use one circuit? There is no way of using multiple circuits in parallel?
For now I solved this by using another script that calls many times the main script and puts the process in background. I use torsocks -i
which uses 3 jumps while I only need one.
You can use more than one circuit in parallel. I invested quite some time to figure out a logic that deals with parallel processing, failed requests and new circuit creation. If you're still interested have a look at the source code of fast-instagram-scraper.
I want to scrape a website asynchronously using a list of tor circuits with different exit nodes and making sure each exit node only makes a request every 5 seconds.
For testing purposes, I'm using the website https://books.toscrape.com/ and I'm lowering the sleep time, number of circuits and number of pages to scrape.
It works fine without tor, but I'm getting the following error when I use tor.: