scalingexcellence / scrapybook

Scrapy Book Code
http://scrapybook.com/
475 stars 209 forks source link

urlopen error time out #15

Closed mic0331 closed 7 years ago

mic0331 commented 7 years ago

I noticed an error message when using scrapy inside the VM. I think it has no consequence for the examples. The error is link to urllib2 shooting a time out. Is this a normal behavior and will be fix in later version ?

Feel free to close this if it is not relevant. Thanks !

> root@dev:~# scrapy shell http://web:9312/properties/property_000000.html
> 2016-09-23 11:04:19 [scrapy] INFO: Scrapy 1.0.3 started (bot: scrapybot)
> 2016-09-23 11:04:19 [scrapy] INFO: Optional features available: ssl, http11, boto
> 2016-09-23 11:04:19 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
> 2016-09-23 11:04:19 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, CoreStats, SpiderState
> 2016-09-23 11:04:19 [boto] DEBUG: Retrieving credentials from metadata server.
> 2016-09-23 11:04:20 [boto] ERROR: Caught exception reading instance data
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url
>     r = opener.open(req, timeout=timeout)
>   File "/usr/lib/python2.7/urllib2.py", line 404, in open
>     response = self._open(req, data)
>   File "/usr/lib/python2.7/urllib2.py", line 422, in _open
>     '_open', req)
>   File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
>     result = func(*args)
>   File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
>     return self.do_open(httplib.HTTPConnection, req)
>   File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
>     raise URLError(err)
> URLError: <urlopen error timed out>
> 2016-09-23 11:04:20 [boto] ERROR: Unable to read instance data, giving up
> 2016-09-23 11:04:20 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
> 2016-09-23 11:04:20 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
> 2016-09-23 11:04:20 [scrapy] INFO: Enabled item pipelines: 
> 2016-09-23 11:04:20 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
> 2016-09-23 11:04:20 [scrapy] INFO: Spider opened
> 2016-09-23 11:04:20 [scrapy] DEBUG: Crawled (200) <GET http://web:9312/properties/property_000000.html> (referer: None)
> [s] Available Scrapy objects:
> [s]   crawler    <scrapy.crawler.Crawler object at 0x7fa05db51b10>
> [s]   item       {}
> [s]   request    <GET http://web:9312/properties/property_000000.html>
> [s]   response   <200 http://web:9312/properties/property_000000.html>
> [s]   settings   <scrapy.settings.Settings object at 0x7fa05db51a90>
> [s]   spider     <DefaultSpider 'default' at 0x7fa05ca57b50>
> [s] Useful shortcuts:
> [s]   shelp()           Shell help (print this help)
> [s]   fetch(req_or_url) Fetch request (or URL) and update local objects
> [s]   view(response)    View response in a browser
lookfwd commented 7 years ago

Yes, this always happens if you start a new scrapy project with scrapy startproject. It happens inside the VM, outside the VM... everywhere. Yes, it's inconsequential and all the projects of the book have in their settings.py this simple fix:

# Disable S3
AWS_ACCESS_KEY_ID = ""
AWS_SECRET_ACCESS_KEY = ""

Here is the Scrapy issue and two Stack Overflow questions [1][2]