Impossible to start scrapydweb in Docker

jdespatis commented 5 years ago

I'l testing scrapydweb on a docker, but it doesn't work, I must miss something I guess

I get indeed an error 500: 'NoneType' object has no attribute 'group'

Basically, here is my Dockerfile:

FROM python:3.6-jessie

ENV TZ="Europe/Paris"

WORKDIR /app

RUN pip install scrapydweb

RUN cp /usr/local/lib/python3.6/site-packages/scrapydweb/default_settings.py /app/scrapydweb_settings_v7.py

EXPOSE 5000

CMD ["scrapydweb", "--disable_auth", "--disable_logparser", "--scrapyd_server=scrapyd:6800"]

And here is the full logs of scrapydweb: when I go on localhost:5000

scrapydweb_1     | [2019-01-20 21:14:53,789] ERROR in flask.app: Exception on /1/dashboard/ [GET]
scrapydweb_1     | Traceback (most recent call last):
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
scrapydweb_1     |     response = self.full_dispatch_request()
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
scrapydweb_1     |     rv = self.handle_user_exception(e)
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
scrapydweb_1     |     reraise(exc_type, exc_value, tb)
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
scrapydweb_1     |     raise value
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
scrapydweb_1     |     rv = self.dispatch_request()
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
scrapydweb_1     |     return self.view_functions[rule.endpoint](**req.view_args)
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/views.py", line 88, in view
scrapydweb_1     |     return self.dispatch_request(*args, **kwargs)
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/scrapydweb/jobs/dashboard.py", line 57, in dispatch_request
scrapydweb_1     |     return self.generate_response()
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/scrapydweb/jobs/dashboard.py", line 98, in generate_response
scrapydweb_1     |     _url_items = re.search(r"href='(.*?)'>", row['items']).group(1)
scrapydweb_1     | AttributeError: 'NoneType' object has no attribute 'group'
scrapydweb_1     | [2019-01-20 21:14:53,816] INFO in werkzeug: 192.168.48.1 - - [20/Jan/2019 21:14:53] "GET /1/dashboard/ HTTP/1.1" 500 -
scrapydweb_1     | [2019-01-20 21:14:55,497] INFO in werkzeug: 192.168.48.1 - - [20/Jan/2019 21:14:55] "GET / HTTP/1.1" 302 -
scrapydweb_1     | [2019-01-20 21:14:55,521] ERROR in flask.app: Exception on /1/dashboard/ [GET]
scrapydweb_1     | Traceback (most recent call last):
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
scrapydweb_1     |     response = self.full_dispatch_request()
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
scrapydweb_1     |     rv = self.handle_user_exception(e)
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
scrapydweb_1     |     reraise(exc_type, exc_value, tb)
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
scrapydweb_1     |     raise value
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
scrapydweb_1     |     rv = self.dispatch_request()
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
scrapydweb_1     |     return self.view_functions[rule.endpoint](**req.view_args)
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/views.py", line 88, in view
scrapydweb_1     |     return self.dispatch_request(*args, **kwargs)
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/scrapydweb/jobs/dashboard.py", line 57, in dispatch_request
scrapydweb_1     |     return self.generate_response()
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/scrapydweb/jobs/dashboard.py", line 98, in generate_response
scrapydweb_1     |     _url_items = re.search(r"href='(.*?)'>", row['items']).group(1)
scrapydweb_1     | AttributeError: 'NoneType' object has no attribute 'group'
scrapydweb_1     | [2019-01-20 21:14:55,527] INFO in werkzeug: 192.168.48.1 - - [20/Jan/2019 21:14:55] "GET /1/dashboard/ HTTP/1.1" 500 -
scrapydweb_1     | [2019-01-20 21:14:55,759] INFO in werkzeug: 192.168.48.1 - - [20/Jan/2019 21:14:55] "GET / HTTP/1.1" 302 -
scrapydweb_1     | [2019-01-20 21:14:55,776] ERROR in flask.app: Exception on /1/dashboard/ [GET]
scrapydweb_1     | Traceback (most recent call last):
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
scrapydweb_1     |     response = self.full_dispatch_request()
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
scrapydweb_1     |     rv = self.handle_user_exception(e)
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
scrapydweb_1     |     reraise(exc_type, exc_value, tb)
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
scrapydweb_1     |     raise value
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
scrapydweb_1     |     rv = self.dispatch_request()
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
scrapydweb_1     |     return self.view_functions[rule.endpoint](**req.view_args)
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/flask/views.py", line 88, in view
scrapydweb_1     |     return self.dispatch_request(*args, **kwargs)
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/scrapydweb/jobs/dashboard.py", line 57, in dispatch_request
scrapydweb_1     |     return self.generate_response()
scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/scrapydweb/jobs/dashboard.py", line 98, in generate_response
scrapydweb_1     |     _url_items = re.search(r"href='(.*?)'>", row['items']).group(1)
scrapydweb_1     | AttributeError: 'NoneType' object has no attribute 'group'
scrapydweb_1     | [2019-01-20 21:14:55,781] INFO in werkzeug: 192.168.48.1 - - [20/Jan/2019 21:14:55] "GET /1/dashboard/ HTTP/1.1" 500 -

Any idea how to fix this ?

Thanks

jdespatis commented 5 years ago

Well, I've tried also spider keeper that is working (the GUI), but didn't want to send egg to sracpyd I've fixed the issue by forcing scrapyd to use the exact same version of python as the version my scrapers use, and spiderkeeper now works completely

And as a result, scrapydweb works also now, no more error 500

I guess there's a problem with scrapydweb, that could have a GUI that works at least, even if scrapyd is badly configured But everything works now :)

my8100 commented 5 years ago

The key is that you should split this argument passed in: --scrapyd_server=scrapyd:6800

This works for me:

Content of the Dockerfile

FROM python:3.6-jessie

ENV TZ="Europe/Paris"

WORKDIR /app

RUN pip install scrapydweb

RUN cp /usr/local/lib/python3.6/site-packages/scrapydweb/default_settings.py /app/scrapydweb_settings_v7.py

EXPOSE 5000

CMD ["scrapydweb", "--disable_auth", "--disable_logparser", "--scrapyd_server", "IP-OF-YOUR-SCRAPYD-SERVER:6800"]

Docker commands

ubuntu@ubuntu:~/docker$ sudo docker build -t scrapydweb:latest .

ubuntu@ubuntu:~/docker$ sudo docker run -d -p 5000:5000 scrapydweb
1da5a344b172f5e2d22f8e34a2ba0733c26e4e87be39c266c3ecc9a34eb41802

ubuntu@ubuntu:~/docker$ sudo docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
1da5a344b172        scrapydweb          "scrapydweb --disabl…"   16 seconds ago      Up 15 seconds       0.0.0.0:5000->5000/tcp   amazing_edison

ubuntu@ubuntu:~/docker$ sudo docker logs 1da
[2019-01-21 09:02:38,892] INFO in werkzeug:  * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
[2019-01-21 09:03:31,004] INFO in werkzeug: 172.17.0.1 - - [21/Jan/2019 09:03:31] "GET / HTTP/1.1" 302 -
[2019-01-21 09:03:32,143] INFO in werkzeug: 172.17.0.1 - - [21/Jan/2019 09:03:32] "GET /1/dashboard/ HTTP/1.1" 200 -
[2019-01-21 09:03:32,660] INFO in werkzeug: 172.17.0.1 - - [21/Jan/2019 09:03:32] "GET /static/v110/css/style.css HTTP/1.1" 200 -

my8100 commented 5 years ago

Well, I've tried also spider keeper that is working (the GUI), but didn't want to send egg to sracpyd I've fixed the issue by forcing scrapyd to use the exact same version of python as the version my scrapers use, and spiderkeeper now works completely

And as a result, scrapydweb works also now, no more error 500

I guess there's a problem with scrapydweb, that could have a GUI that works at least, even if scrapyd is badly configured But everything works now :)

So, you were running another app on the same port 5000 when the error 500 raised?

jdespatis commented 5 years ago

Not another app on same port 5000. Indeed, in my config, everything is in a docker-compose, each micro service running in a separate area

Thanks for settings, but I've tried yesterday, and problem was the same indeed

Nevermind, all is working now scrapyd is goodly configured, thanks! Need some stuff though, log parser as another micro service, reverse proxy everything for auth / auto ssl, etc.

my8100 commented 5 years ago

I knew you may be using docker-compose from the name 'scrapydweb_1'.

Actually, I was wondering why ScrapydWeb would raise the exception below. When the code reached line 98, it had fetched the page content from somewhere like 'http://127.0.0.1:6800/jobs', and everything should be working well.

@jdespatis You can also pass in the argument '--verbose' for trouble shooting if needed.

scrapydweb_1     |   File "/usr/local/lib/python3.6/site-packages/scrapydweb/jobs/dashboard.py", line 98, in generate_response
scrapydweb_1     |     _url_items = re.search(r"href='(.*?)'>", row['items']).group(1)
scrapydweb_1     | AttributeError: 'NoneType' object has no attribute 'group'

my8100 / scrapydweb

Impossible to start scrapydweb in Docker #20

Content of the Dockerfile

Docker commands