scrapy-plugins / scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
BSD 3-Clause "New" or "Revised" License
357 stars 88 forks source link

Support more secure ways to declare the APIKEY #88

Open BurnzZ opened 4 years ago

BurnzZ commented 4 years ago

BACKGROUND:

As of version 1.6.0, there are two (2) ways of adding the API KEYS:

  1. via the settings.py:
CRAWLERA_APIKEY = 'apikey'
  1. via spider attribute:
class SampleSpider(scrapy.Spider):
    crawlera_apikey = 'apikey'

When using Scrapy Cloud, we could also declare it via:

  1. via Spider/Project settings

image

  1. via Scrapy Cloud Crawlera add-on

image

PROBLEM

What actually happens in reality is that the API KEYS are being written inside the code and committed in the repo.

The best practice would be to avoid any sensitive keys to be coupled alongside the code. #3 and #4 above already fixes this problem as we have the option to only declare the keys inside Scrapy Cloud.

However, this becomes a problem when trying to run the spider locally during development as the keys might not be there.

OBJECTIVES

This issue aims to be a discussion ground on exploring better ways to handle it.

For starters, here are a couple of ways to approach it:

Either way, it should support different API KEYs per spider.

Gallaecio commented 4 years ago

Option A is already doable through: https://docs.scrapy.org/en/latest/topics/settings.html#command-line-options