scrapy / scrapyd-client

Command line client for Scrapyd server
BSD 3-Clause "New" or "Revised" License
770 stars 146 forks source link

Support for non-Basic Auth in scrapyd-deploy #15

Open jwebb-va opened 8 years ago

jwebb-va commented 8 years ago

My team is working on a set of scrapy spiders which we want to deploy to a scrapyd server. Our scrapyd server is configured to use an oauth2 proxy to authenticate traffic.

On all of our requests to our Scrapyd API, we need the following header to authenticate our requests:

Authorization: Bearer 1/AbC123

where 1/AbC123 is a OAuth2 access token.

Currently the scrapyd-deploy utility only supports using Basic auth.

jwebb-va commented 8 years ago

I foresee a few ways to solve this issue...

scrapyd-deploy could support some kind of oauth2_bearer_token key in the scrapy.cfg file. This could add the HTTP Authorization header for us automatically.

[deploy]
url = https://scrapyd.example.com/
project = example
oauth2_bearer_token = 1/AbC123

A more generic solution might be to simply allow users to specify additional HTTP headers manually via command-line arguments. This would allow us to use OAuth2 authentication but would also allow other use cases which require custom headers.

scrapyd-deploy -h "Authorization: Bearer 1/AbC123", "Another-Header: foo"
madzohan commented 8 years ago

Currently the scrapyd-deploy utility only supports using Basic auth.

Can you explain how it can be done?

Digenis commented 8 years ago

@VlaGrishenko, https://github.com/scrapy/scrapyd-client/blob/47a7c7a209e52c4e1d2e35b79bbb3900a8c05ecc/scrapyd-client/scrapyd-deploy#L207-L210 You need to add a username and password to your target in scrapy.cfg. scrapyd doesn't support authorization but you can configure it to listen only on 127.0.0.1 and then use apache to proxy connections from elsewhere while requiring them to provide credentials.

You still however trust users of the same computer connecting on 127.0.0.1 without auth. What makes sense to me is enabling scrapyd to listen on a uds to which only the apache user has access and then proxy it as described above.