scrapy / scrapyd

A service daemon to run Scrapy spiders
https://scrapyd.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
2.96k stars 569 forks source link

unable to override custom settings #355

Closed ChristianYeah closed 5 years ago

ChristianYeah commented 5 years ago

I have settings in the setting.py

LUMINATI_PROXY_ENABLED = True
LUMINATI_PROXY_ADDRESS = 'http://127.0.0.1:24000'

In some case, I'd like to disable the proxy by passing LUMINATI_PROXY_ENABLED=False, like

curl http://localhost:6800/schedule.json -d project=akc_spiders -d spider=jd-akc-2 -d _version=10.0.1 -d link=http://localhost:9876/static/js/1.json -d setting=LUMINATI_PROXY_ENABLED=False -d setting=DOWNLOAD_DELAY=2

however, based on the log, only built-in settings will be modified

2019-09-12 16:32:12 [scrapy.crawler] INFO: Overridden settings: {'AUTOTHROTTLE_MAX_DELAY': 2, 'AUTOTHROTTLE_START_DELAY': 1, 'BOT_NAME': 'akc_spiders', 'COOKIES_DEBUG': True, 'DOWNLOAD_DELAY': 1, 'LOG_FILE': 'logs/akc_spiders/jd-akc-2/d1471d7cd53711e99dc3acde48001122.log', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'akc_spiders.spiders', 'SPIDER_MODULES': ['akc_spiders.spiders', 'akc_spiders.baobeicang_spiders', 'akc_spiders.beidian_spiders', 'akc_spiders.jd_spiders', 'akc_spiders.vip_spiders']}
ChristianYeah commented 5 years ago

turns out you can do like this use getbool instead of get

class ProxyMiddleware(object):

    def __init__(self, proxy_enabled, proxy_address):
        self.proxy_enabled = proxy_enabled
        self.proxy_address = proxy_address

    @classmethod
    def from_crawler(cls, crawler):
        return cls(
            # proxy_enabled=crawler.settings.get('LUMINATI_PROXY_ENABLED', False),
            # use getbool method instead
            proxy_enabled=crawler.settings.getbool('LUMINATI_PROXY_ENABLED', False),
            proxy_address=crawler.settings.get('LUMINATI_PROXY_ADDRESS'),
        )

    def process_request(self, request, spider):
        if self.proxy_enabled:
            request.meta["proxy"] = self.proxy_address
        return None