scrapinghub / scrapyrt

HTTP API for Scrapy spiders
BSD 3-Clause "New" or "Revised" License
824 stars 161 forks source link

Use environment variables #125

Closed VikashKothary closed 3 years ago

VikashKothary commented 3 years ago

Why

Configuration using environment variables is very popular. This is especially useful when running an application in Docker.

This isn't a major issue since we can technically use environment variables to configure Scrapyrt. But personally I think it would be cool to have it out-of-the-box.

Worse-case this will document how I did it and might help someone else who wanted to do this.

How

This is how I do it currently. I run:

$ scrapyrt -i ${SCRAPYRT_HOST} -p ${SCRAPYRT_PORT} -S scrapyrt.settings

where scrapyrt.settings points to my scrapy spider's settings file:

# file: scrapyrt/settings.py

DEBUG = os.getenv('DEBUG', False)

SERVICE_ROOT = os.getenv('SCRAPYRT_SERVICE_ROOT', 'scrapyrt.resources.RealtimeApi')
CRAWL_MANAGER = os.getenv('SCRAPYRT_CRAWL_MANAGER', 'scrapyrt.core.CrawlManager')
LOG_DIR = os.getenv('SCRAPYRT_LOG_DIR', 'logs')
TIMEOUT_LIMIT = os.getenv('SCRAPYRT_TIMEOUT_LIMIT', 1000)
PROJECT_SETTINGS = os.getenv('SCRAPYRT_PROJECT_SETTINGS', None)
LOG_FILE = os.getenv('SCRAPYRT_LOG_FILE', None)
LOG_ENCODING = os.getenv('SCRAPYRT_LOG_ENCODING', 'utf-8')

What

My understanding is if the above change was made to the default_settings.py then this will be available out-of-the-box. Which might be a nice feature to have.

Host and Port can potentially be set here as the defaults assuming you want CLI to override environment variables.

P.S My 2 cents is that Host and Port should also be configurable in settings.py. If you're happy to have this change, then this could be a good time to make that change.

pawelmhm commented 3 years ago

I'd say configuring everything via environment variables may make it a little more difficult to develop. With all settings written in settings file you just pull code and everyone can jump and start working on it. With environment variables, you have to share it with other developers somehow, you have to maintain some configuration for people. So, I'd say it is better to keep it simple, like it is now. You can always add configuration from environment in docker easily, it won't be a serious problem.

VikashKothary commented 3 years ago

Hi @pawelmhm, thank you for the time.

Like I mentioned, it's not a problem for me since I already use the above for a workaround and my goal was mainly to share this with other people might be interested in solving the same problem I did.

I understand if you think it's out-of-the-scope for the library but I wanted to clarify a few things if this ever comes up in the future.

  1. You should use .env for development

    configuring everything via environment variables may make it a little more difficult to develop

  1. Environment variables would have default

    you just pull code and everyone can jump and start working on it

  1. This is not for values that you want to share

    you have to share it with other developers somehow

  1. Config files and Docker don't go together

    add configuration from environment in docker

  1. This is a common convention