my8100 / scrapydweb

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO :point_right:
https://github.com/my8100/files
GNU General Public License v3.0
3.17k stars 565 forks source link

Timer tasks not working with auth on #181

Open Tobeyforce opened 3 years ago

Tobeyforce commented 3 years ago

When having auth enabled, my timer tasks stop working. The response visible in result is: image

So Scrapyd is trying to send a request to Scrapydweb, but with auth it expects the basic auth, which Scrapyd does not add to the header. Is there any way to fix this? It's worth mentioning I have deployed Scrapydweb with gunicorn&nginx.

Any advice would be helpful.

my8100 commented 3 years ago
  1. Click the history button on the timer tasks page, then post the related log.
  2. Run scrapydweb without gunicorn&nginx and try again.
Tobeyforce commented 3 years ago

History log:

[2021-04-08 16:20:05,034] WARNING in apscheduler: Fail to execute task #1 (upplandsbrohus sthlm 10min - edit) on node 1, would retry later: Request got {'status_code': 401, 'status': 'error', 'message': "<script>alert('Fail to login: basic auth for ScrapydWeb has been enabled');</script>"}
[2021-04-08 16:20:08,039] ERROR in apscheduler: Fail to execute task #1 (upplandsbrohus sthlm 10min - edit) on node 1, no more retries: Traceback (most recent call last):
  File "/var/www/html/scrapydweb/views/operations/execute_task.py", line 89, in schedule_task
    assert js['status_code'] == 200 and js['status'] == 'ok', "Request got %s" % js
AssertionError: Request got {'status_code': 401, 'status': 'error', 'message': "<script>alert('Fail to login: basic auth for ScrapydWeb has been enabled');</script>"}

[2021-04-08 16:20:40,519] WARNING in apscheduler: Shutting down the scheduler for timer tasks gracefully, wait until all currently executing tasks are finished
[2021-04-08 16:20:40,521] WARNING in apscheduler: The main pid is 1267. Kill it manually if you don't want to wait

Unfortunately running Scrapyd with gunicorn&nginx has created all kinds of problems for me, I hope you one day add an official way to deploy scrapydweb so that we don't have to create workarounds :( Without a prod server I've never had issues, so I know it would work otherwise.

My understanding is that each request goes through a middleware in run.py

    @app.before_request
    def require_login():
        if app.config.get('ENABLE_AUTH', False):
            auth = request.authorization
            USERNAME = str(app.config.get('USERNAME', ''))  # May be 0 from config file
            PASSWORD = str(app.config.get('PASSWORD', ''))
            if not auth or not (auth.username == USERNAME and auth.password == PASSWORD):
                return authenticate()

My only workaround so far is to change this..

my8100 commented 3 years ago

Could you debug with the following steps first?

  1. Run scrapydweb without gunicorn&nginx and try again.
  2. Run scrapydweb with gunicorn and try again.
  3. Run scrapydweb with nginx and try again.