my8100 / scrapyd-cluster-on-heroku

Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO :point_right:
https://scrapydweb.herokuapp.com/
GNU General Public License v3.0
122 stars 94 forks source link

[Question] How do you protect spider from being easily accessed? #10

Closed andreierdoss closed 4 years ago

andreierdoss commented 4 years ago

I noticed that if I visit the spider page, I see the following:

`Scrapyd Available projects: ScrapydWeb_demo

Jobs Items Logs Documentation How to schedule a spider? To schedule a spider you need to use the API (this web UI is only for monitoring)

Example using curl:

curl http://localhost:6800/schedule.json -d project=default -d spider=somespider

For more information about the API, see the Scrapyd documentation`

How can this be protected?

my8100 commented 4 years ago

Basic auth for Scrapyd is supported in https://github.com/scrapy/scrapyd/pull/326

  1. To deploy git version of Scrapyd: https://github.com/my8100/scrapyd-cluster-on-heroku-scrapyd-app-git
  2. Set up the ENABLE_AUTH, USERNAME, and PASSWORD arguments when deploying Scrapyd app in the previous step.
  3. Now your Scrapyd app is protected with basic auth.
andreierdoss commented 4 years ago

Thank you for the hard work that you do on this project and also the prompt reply. Also I want to congratulate you on getting the auth code in, after 4 years since the original request!

For context, I cloned your repo because I have other requirements, such as SQLAlchemy etc. following these instructions: https://github.com/my8100/scrapyd-cluster-on-heroku#custom-deployment

Also I went and I added the 3 environment variables to Heroku (ENABLE_AUTH, USERNAME, and PASSWORD), but unfortunately I was not prompted for a login and I was able to see this: https://imgur.com/a/NJ7CCS5

I then noticed that there are some discrepancies between this repo: https://github.com/my8100/scrapyd-cluster-on-heroku (scrapyd section) and this one: https://github.com/my8100/scrapyd-cluster-on-heroku-scrapyd-app-git

I manually copied over the changes in order to update it and now I get the login prompt.