Error when running scrapy shell commands examples as well as running spider code

gtinjr commented 6 years ago

I am getting the an error when running the following shell command in the docker scrapybook_dev_1 shell: scrapy shell http://web:9312/properties/property_000000.html

The same when running the following spider from A scrapy project section of the book: scrapy crawl basic

2017-11-03 02:47:59 [boto] DEBUG: Retrieving credentials from metadata server. 2017-11-03 02:48:00 [boto] ERROR: Caught exception reading instance data Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url r = opener.open(req, timeout=timeout) File "/usr/lib/python2.7/urllib2.py", line 404, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 422, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open raise URLError(err) URLError: 2017-11-03 02:48:00 [boto] ERROR: Unable to read instance data, giving up

lookfwd commented 6 years ago

Isthis error preventing the shell to start? Going inside properties folder should stop the boto errors. It was a problem of a fewscrapy versions

On Thu, Nov 2, 2017 at 10:59 PM gtinjr notifications@github.com wrote:

I am getting the an error when running the following shell command in the docker scrapybook_dev_1 shell: scrapy shell http://web:9312/properties/property_000000.html

The same when running the following spider from A scrapy project section of the book: scrapy crawl basic

2017-11-03 02:47:59 [boto] DEBUG: Retrieving credentials from metadata server. 2017-11-03 02:48:00 [boto] ERROR: Caught exception reading instance data Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url r = opener.open(req, timeout=timeout) File "/usr/lib/python2.7/urllib2.py", line 404, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 422, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open raise URLError(err) URLError: 2017-11-03 02:48:00 [boto] ERROR: Unable to read instance data, giving up

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scalingexcellence/scrapybook/issues/43, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwLb1d44zpvhR2_RApB8ZTlNVtLN9qzks5syoGWgaJpZM4QQmpw .

gtinjr commented 6 years ago

This error does not prevent the commands from the shell to finish and it neither prevents the spider to run. I also noticed that a similar, if not the same, issue was fixed on scrapy 1.1. I just want to make sure that this was a known issue with these docker images. Maybe putting this on the README.md page or updating the images with the latest scrapy may help.

lookfwd commented 6 years ago

Yes - it's known... it's a pity and it confused people. All my settings.py e.g. this set those keys to empty which mitigates the problem. I can't wait till the 2nd version of the book is out, early next year and all those problems will go away.

gtinjr commented 6 years ago

Disabling S3 in settings.py solved the problem

DOWNLOAD_HANDLERS = { 's3': None, }

scalingexcellence / scrapybook

Error when running scrapy shell commands examples as well as running spider code #43