Open theproffesor opened 6 years ago
Hey @theproffesor If you can provide more details about the issue that would be very helpful for me to understand the actual issue and work on it
I'm running into the same problem.
ConnectionError: HTTPConnectionPool(host='localhost', port=6800): Max retries exceeded with url: /schedule.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdc28b04a50>: Failed to establish a new connection: [Errno 111] Connection refused',))
[21/Mar/2018 22:58:27] "POST /api/projects/edu/spiders/explorecourses.stanford.edu/schedule HTTP/1.0" 500 15544
I created a spider in portia to scrape data from explorecourses.stanford.edu.
edu.zip
I have scrapyd, portia and scrapy-splash running when trying to run the spider from within the portia UI.
I can run the spider if I use the docker command (below), but then I don't get the monitoring of running it through Scrapyd (unless there's another way to run a portia spider through Scrapyd).
docker run -i -t --rm -v <PROJECTS_FOLDER>:/app/data/projects:rw -v <OUPUT_FOLDER>:/mnt:rw -p 9001:9001 scrapinghub/portia \
portiacrawl /app/data/projects/PROJECT_NAME SPIDER_NAME -o /mnt/SPIDER_NAME.jl
pip list: adblockparser (0.7) asn1crypto (0.24.0) attrs (17.4.0) autobahn (17.9.3) Automat (0.6.0) certifi (2018.1.18) cffi (1.11.5) chardet (3.0.4) constantly (15.1.0) cryptography (2.2.1) cssselect (1.0.3) dateparser (0.7.0) funcparserlib (0.3.6) hyperlink (18.0.0) idna (2.6) incremental (17.5.0) jsonschema (2.6.0) loginform (1.2.0) lxml (4.2.0) mock (2.0.0) ndg-httpsclient (0.4.3) numpy (1.12.1) page-finder (0.1.9) parse (1.8.2) parsel (1.4.0) pbr (3.1.1) Pillow (5.0.0) pip (9.0.2) psutil (5.4.3) pyasn1 (0.4.2) pyasn1-modules (0.2.1) pycparser (2.18) PyDispatcher (2.0.5) pyOpenSSL (17.5.0) qt5reactor (0.5) queuelib (1.5.0) requests (2.18.4) retrying (1.3.3) Scrapy (1.5.0) scrapy-splash (0.7.2) scrapyd (1.2.0) scrapyd-client (1.1.0) service-identity (17.0.0) setuptools (39.0.1) six (1.11.0) slybot (0.13.1) slyd (0.0.0) splash (2.3.3) Twisted (17.9.0) txaio (2.9.0) urllib3 (1.22) w3lib (1.19.0) wheel (0.30.0) xvfbwrapper (0.2.9) zope.interface (4.4.3)
I got the same error when trying to run portial in Docker:
Traceback (most recent call last):, File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 41, in inner, response = get_response(request), File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 187, in _get_response, response = self.process_exception_by_middleware(e, request), File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 185, in _get_response, response = wrapped_callback(request, *callback_args, callback_kwargs), File "/usr/local/lib/python3.5/dist-packages/django/views/decorators/csrf.py", line 58, in wrapped_view, return view_func(*args, *kwargs), File "/usr/local/lib/python3.5/dist-packages/rest_framework/viewsets.py", line 95, in view, return self.dispatch(request, args, kwargs), File "/usr/lib/python3.5/contextlib.py", line 30, in inner, return func(*args, kwds), File "/app/portia_server/portia_api/resources/route.py", line 72, in dispatch, return super(JsonApiRoute, self).dispatch(request, *args, *kwargs), File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 494, in dispatch, response = self.handle_exception(exc), File "/app/portia_server/portia_api/resources/route.py", line 75, in handle_exception, response = super(JsonApiRoute, self).handle_exception(exc), File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 454, in handle_exception, self.raise_uncaught_exception(exc), File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 491, in dispatch, response = handler(request, args, kwargs), File "/app/portia_server/portia_api/resources/spiders.py", line 80, in schedule, data = Deployer(self.project).schedule(spider_id), File "/app/portia_server/portia_api/utils/deploy/scrapyd.py", line 73, in schedule, data=schedule_data), File "/usr/local/lib/python3.5/dist-packages/requests/api.py", line 112, in post, return request('post', url, data=data, json=json, kwargs), File "/usr/local/lib/python3.5/dist-packages/requests/api.py", line 58, in request, return session.request(method=method, url=url, kwargs), File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 508, in request, resp = self.send(prep, send_kwargs), File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 618, in send, r = adapter.send(request, kwargs), File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 508, in send, raise ConnectionError(e, request=request), requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=6800): Max retries exceeded with url: /schedule.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa16d3772b0>: Failed to establish a new connection: [Errno 111] Connection refused',)), [28/Aug/2018 01:08:53] "POST /api/projects/timsach/spiders/timsach.top/schedule HTTP/1.0" 500 18397,
When attempting to run spider a notification pops telling me to notify dev team of an unexpected error.
Console:
[28/Jan/2018 23:20:25] "PATCH /api/projects/MayWes/spiders/www.maywes.com HTTP/1.0" 200 990 [28/Jan/2018 23:20:35] "PATCH /api/projects/MayWes/spiders/www.maywes.com HTTP/1.0" 200 991 Internal Server Error: /api/projects/MayWes/spiders/www.maywes.com/schedule Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/exception.py", line 39, in inner response = get_response(request) File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 187, in _get_response response = self.process_exception_by_middleware(e, request) File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 185, in _get_response response = wrapped_callback(request, *callback_args, callback_kwargs) File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/csrf.py", line 58, in wrapped_view return view_func(*args, *kwargs) File "/usr/local/lib/python2.7/dist-packages/rest_framework/viewsets.py", line 90, in view return self.dispatch(request, args, kwargs) File "/usr/local/lib/python2.7/dist-packages/django/utils/decorators.py", line 185, in inner return func(*args, kwargs) File "/app/portia_server/portia_api/resources/route.py", line 74, in dispatch return super(JsonApiRoute, self).dispatch(request, *args, *kwargs) File "/usr/local/lib/python2.7/dist-packages/rest_framework/views.py", line 489, in dispatch response = self.handle_exception(exc) File "/app/portia_server/portia_api/resources/route.py", line 77, in handle_exception response = super(JsonApiRoute, self).handle_exception(exc) File "/usr/local/lib/python2.7/dist-packages/rest_framework/views.py", line 449, in handle_exception self.raise_uncaught_exception(exc) File "/usr/local/lib/python2.7/dist-packages/rest_framework/views.py", line 486, in dispatch response = handler(request, args, kwargs) File "/app/portia_server/portia_api/resources/spiders.py", line 82, in schedule request = requests.post(settings.SCHEDULE_URL, data=schedule_data) File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 112, in post return request('post', url, data=data, json=json, kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request return session.request(method=method, url=url, kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request resp = self.send(prep, send_kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 618, in send r = adapter.send(request, kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 508, in send raise ConnectionError(e, request=request) ConnectionError: HTTPConnectionPool(host='localhost', port=6800): Max retries exceeded with url: /schedule.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffb2536c390>: Failed to establish a new connection: [Errno 111] Connection refused',)) [28/Jan/2018 23:24:31] "POST /api/projects/MayWes/spiders/www.maywes.com/schedule HTTP/1.0" 500 14921 System is Linux Mint something or another version......
Any help would be greatly appreciated as I really want to check this out.