hesussavas / corners_stats

Statistical project on soccer's corners
40 stars 2 forks source link

Connection Refused #2

Closed u015216 closed 7 years ago

u015216 commented 7 years ago

Observed the following error which caused the shell to exit:

image

I am somewhat new to web scraping with Python, the OS I am running is Ubuntu with Anaconda (python 3.6). Any assistance with this would be greatly appreciated!

hesussavas commented 7 years ago

I've increased sleep time for start_scraping (https://github.com/hesussavas/corners_stats/blob/master/Makefile#L25), update your code and try again. This should help. If you'll have the same issue again then double the sleep time

u015216 commented 7 years ago

Okay, the additional sleep time did the trick for the connection issue, however the application seems to run without error but I don't believe any results are returned, or its not scraping the pages in general. Below is the log from the command line (running: sudo make start_scraping). Would it be possible for you to review and let me know if this is just an user error (my issue)?

docker build \ --file=Dockerfile \ -t corners/bash:dev \ . Sending build context to Docker daemon 64.43MB Step 1/8 : FROM python:3.6 ---> c5700ee6fe7b Step 2/8 : RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends python-pip ---> Using cache ---> 6cbb5353ee2d Step 3/8 : COPY requirements.txt /tmp/ ---> Using cache ---> 866ef44e2a1d Step 4/8 : ENV MPLBACKEND "agg" ---> Using cache ---> f42f5740388f Step 5/8 : RUN pip install -r /tmp/requirements.txt ---> Using cache ---> 6b4e3881ce3a Step 6/8 : COPY . /opt/corners ---> Using cache ---> e8ab01dbaa96 Step 7/8 : WORKDIR /opt/corners ---> Using cache ---> 5ec54d6c9858 Step 8/8 : CMD bash ---> Using cache ---> 6e4afe087133 Successfully built 6e4afe087133 Successfully tagged corners/bash:dev docker run -d \ -e POSTGRES_PASSWORD=corners \ -e POSTGRES_USER=corners\ -e POSTGRES_DB=corners\ -p 8432:5432 \ --name corners-postgres \ postgres:9.5 0f2a214ff19d561f69725586d510323e323be704745eb8162199492990f18b1a sleep 20 docker run --rm -i \ --link corners-postgres \ -e DEV_PSQL_URI=postgresql://corners:corners@corners-postgres:5432/corners \ corners/bash:dev \ ./start.sh 2017-07-19 18:15:06 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: corners442) 2017-07-19 18:15:06 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'corners442', 'CONCURRENT_REQUESTS': 4, 'CONCURRENT_REQUESTS_PER_DOMAIN': 2, 'CONCURRENT_REQUESTS_PER_IP': 2, 'DOWNLOAD_DELAY': 5, 'NEWSPIDER_MODULE': 'corners442.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['corners442.spiders'], 'USER_AGENT': 'Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405'} 2017-07-19 18:15:07 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2017-07-19 18:15:07 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2017-07-19 18:15:07 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2017-07-19 18:15:07 [scrapy.middleware] INFO: Enabled item pipelines: ['corners442.pipelines.LeaguePipeline'] 2017-07-19 18:15:07 [scrapy.core.engine] INFO: Spider opened 2017-07-19 18:15:07 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2017-07-19 18:15:07 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 2017-07-19 18:15:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fourfourtwo.com/robots.txt> (referer: None) 2017-07-19 18:15:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fourfourtwo.com/> (referer: None) 2017-07-19 18:15:08 [scrapy.core.engine] INFO: Closing spider (finished) 2017-07-19 18:15:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 628, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'downloader/response_bytes': 17119, 'downloader/response_count': 2, 'downloader/response_status_count/200': 2, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2017, 7, 19, 18, 15, 8, 263953), 'log_count/DEBUG': 3, 'log_count/INFO': 7, 'response_received_count': 2, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2017, 7, 19, 18, 15, 7, 329204)} 2017-07-19 18:15:08 [scrapy.core.engine] INFO: Spider closed (finished) 2017-07-19 18:15:19 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: corners442) 2017-07-19 18:15:19 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'corners442', 'CONCURRENT_REQUESTS': 4, 'CONCURRENT_REQUESTS_PER_DOMAIN': 2, 'CONCURRENT_REQUESTS_PER_IP': 2, 'DOWNLOAD_DELAY': 5, 'NEWSPIDER_MODULE': 'corners442.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['corners442.spiders'], 'USER_AGENT': 'Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405'} 2017-07-19 18:15:19 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2017-07-19 18:15:19 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2017-07-19 18:15:19 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2017-07-19 18:15:19 [scrapy.middleware] INFO: Enabled item pipelines: ['corners442.pipelines.Corners442Pipeline'] 2017-07-19 18:15:19 [scrapy.core.engine] INFO: Spider opened 2017-07-19 18:15:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2017-07-19 18:15:19 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 2017-07-19 18:15:19 [scrapy.core.engine] INFO: Closing spider (finished) 2017-07-19 18:15:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'finish_reason': 'finished', 'finish_time': datetime.datetime(2017, 7, 19, 18, 15, 19, 517179), 'log_count/DEBUG': 1, 'log_count/INFO': 7, 'start_time': datetime.datetime(2017, 7, 19, 18, 15, 19, 500297)} 2017-07-19 18:15:19 [scrapy.core.engine] INFO: Spider closed (finished)