istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

http://scrapy-cluster.readthedocs.io/

MIT License

1.17k stars 323 forks source link

runspider: error: Unable to load 'link_spider.py': attempted relative import with no known parent package #268

Closed BeamoINT closed 10 months ago

BeamoINT commented 11 months ago

Hello, I got this issue when trying to start the runspider on Scrapy Cluster so that I could feed URLs into it.I have everything set up properly, Kafka is good, Redis is good, Zookeeper is good, etc. I just don't know what this issue could be. Thanks so much!

root@crawler:~/scrapy-cluster/crawler/crawling/spiders# scrapy runspider link_spider.py Usage

scrapy runspider [options] runspider: error: Unable to load 'link_spider.py': attempted relative import with no known parent package

madisonb commented 11 months ago

The docs here should show you how to run your spider properly.

scrapy runspider crawling/spiders/link_spider.py

BeamoINT commented 11 months ago

I have been looking through the docs and have not found the issue yet, but I have a few more questions, does this command automatically start the crawler without anything having to be fed into it?

scrapy runspider crawling/spiders/link_spider.py

If so, is there a starting URL in the settings and does it branch off from there to crawl multiple URLs from the seed URL? If you do have to feed a URL into it to start it, does it then automatically start crawling other URLs from there? Sorry for the so many questions, thank you for your help.

madisonb commented 11 months ago

Scrapy cluster runs on inbound requests via Kafka. Please see the API documentation on how to push requests into the cluster.

Please close this issue if the original request has been answered.

madisonb commented 10 months ago

Closing due to inactivity