rejoiceinhope / scrapy-proxy-pool

164 stars 33 forks source link

Running spiders sequentially - Proxy Pool is already defined as a collector #5

Open samLozier opened 4 years ago

samLozier commented 4 years ago

I'm trying to run two spiders in sequence in a single file and am getting this error: proxyscrape.errors.CollectorAlreadyDefinedError: proxy-pool is already defined as a collector

I'm attempting to run my spiders like: spider1() spider2()

Either spider will run as expected if I comment out the other line, but they will not run in sequence. Is there an easy work around, or should I separate this behavior into two separate .py files? Editing to add: This is clearly related to the proxyscrape package that proxy-pool is using, but I'm wondering if the behavior triggering the error is coming from my own code, proxy pool or proxyscrape.

Problem: Can't run two spiders in sequence using proxy pool from a .py file Anticipated Results: The second spider would run after the first one concludes.

panesarm commented 4 years ago

Did you ever resolve this issue?

samLozier commented 4 years ago

Did you ever resolve this issue?

no, I just worked around it by using another script to call two separate files. This a hazy recollection, but I think it had to do with the crawler process already running. By calling the scripts independently I was able to control them more easily (run multiple crawler processes concurrently, or wait for one to finish before triggering the second)