Open olegario96 opened 6 years ago
Hi Olegario,
I tried to just run the scrapy crawl quotes
command in the /bin/ash
shell, and got this error:
Kurts-MacBook-Pro:tutorial kurtpeek$ docker run -it scraper-compose_scraper /bin/ash
/scraper/tutorial # scrapy crawl quotes
2018-09-14 03:19:58 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: tutorial)
2018-09-14 03:19:58 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.7.0 (default, Sep 12 2018, 02:07:16) - [GCC 6.4.0], pyOpenSSL 18.0.0 (OpenSSL 1.0.2o 27 Mar 2018), cryptography 2.3.1, Platform Linux-4.9.93-linuxkit-aufs-x86_64-with
2018-09-14 03:19:58 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'tutorial', 'NEWSPIDER_MODULE': 'tutorial.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['tutorial.spiders']}
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 150, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 90, in _run_print_help
func(*a, **kw)
File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 157, in _run_command
cmd.run(args, opts)
File "/usr/local/lib/python3.7/site-packages/scrapy/commands/crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 170, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 198, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 203, in _create_crawler
return Crawler(spidercls, self.settings)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 55, in __init__
self.extensions = ExtensionManager.from_crawler(self)
File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
mod = import_module(module)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/local/lib/python3.7/site-packages/scrapy/extensions/telnet.py", line 12, in <module>
from twisted.conch import manhole, telnet
File "/usr/local/lib/python3.7/site-packages/twisted/conch/manhole.py", line 154
def write(self, data, async=False):
^
SyntaxError: invalid syntax
It seems from https://github.com/scrapy/scrapy/issues/3143 that this is an issue with Scrapy itself in Python 3.7 (in which async
is a reserved variable name). You might want to try choosing a different image to downgrade the version of Python; feel free to submit a PR if that works!
By the way, this is a fairly 'special' implementation of anonymous scraping which uses the Tor control port to periodically change your apparent IP address. If you don't need this functionality, you could use a simpler image like docker-tor-privoxy-alpine
.
How can I downgrade to version 3.6?
I managed to change the Python version using this dockerfile
# Adapted from trcook/docker-scrapy
FROM python:3.6-alpine
RUN apk --update add python3
RUN echo 'alias python=python3.6' >> ~/.bashrc
RUN apk --update add libxml2-dev libxslt-dev libffi-dev gcc musl-dev libgcc openssl-dev curl
RUN pip install scrapy scrapy-fake-useragent stem pyparsing python-dateutil requests
COPY tutorial /scraper/tutorial
COPY wait-for/wait-for /scraper/tutorial
WORKDIR /scraper/tutorial
CMD ["./wait-for", "tor:9050", "--", "scrapy", "crawl", "quotes"]
But the problem continues
I removed the --silent
from the curl
command and it says:
Received HTTP code 500 from proxy after CONNECT
torrc file needs to be updated to work
add this line:
SOCKSport 0.0.0.0:9050
I cloned the repository and tried to execute the two steps from the
README.md
. The problem, when I executedocker-compose up
the following messages are show:What can be?