move-coop / parsons

A python library of connectors for the progressive community.
https://www.parsonsproject.org/
Other
262 stars 132 forks source link

Addition: Accept Civis default naming conventions for environmental variables #727

Open thebbennett opened 2 years ago

thebbennett commented 2 years ago

Detailed Description

Parsons expects certain naming conventions for it's environmental variables. Take the Redshift class. It expects the following environmental variables: REDSHIFT_USERNAME, REDSHIFT_PASSWORD, REDSHIFT_HOST, REDSHIFT_DB, and REDSHIFT_PORT.

However, I am one of the many Civis users in this community. When you set up the "Redshift" environmental variable in Civis as a databse cred, it creates the following environmental variables: REDSHIFT_ID, REDSHIFT_NAME, REDSHIFT_HOST, REDSHIFT_CREDENTIAL_ID, REDSHIFT_CREDENTIAL_USERNAME, REDSHIFT_CREDENTIAL_PASSWORD., REDSHIFT_DATABASE, REDSHIFT_PORT

There are a few differences here between the environmental variables names that Parsons is expecting and the default names in Civis. Namely:

Parsons Expects This Civis Creates This
REDSHIFT_DB REDSHIFT_DATABASE
REDSHIFT_USERNAME REDSHIFT_CREDENTIAL_USERNAME
REDSHIFT_PASSWORD REDSHIFT_CREDENTIAL_PASSWORD

Context

Because of this, I have to write some weird code in my Parsons scripts that I expect to run as a container script in Civis. Namely:

# If running on container, load this env
try:
    os.environ["REDSHIFT_DB"] = os.environ["REDSHIFT_DATABASE"]
    os.environ["REDSHIFT_USERNAME"] = os.environ["REDSHIFT_CREDENTIAL_USERNAME"]
    os.environ["REDSHIFT_PASSWORD"] = os.environ["REDSHIFT_CREDENTIAL_PASSWORD"]
    os.environ["S3_TEMP_BUCKET"] = os.environ["S3_TEMP_BUCKET"]
    van_key = os.environ["VAN_PASSWORD"]

# If running locally, load this env
except KeyError:
    van_key = os.environ["VAN_API_KEY"]

Possible Implementation

We could edit the init method of Redshift here

https://github.com/move-coop/parsons/blob/main/parsons/databases/redshift/redshift.py

try:

self.username = username or os.environ['REDSHIFT_USERNAME'] or os.environ['REDSHIFT_CREDENTIAL_USERNAME'] 

self.password = password or os.environ['REDSHIFT_PASSWORD'] or os.environ['REDSHIFT_CREDENTIAL_PASSWORD'] 

self.host = host or os.environ['REDSHIFT_HOST']

self.db = db or os.environ['REDSHIFT_DB'] or os.environ['REDSHIFT_DATABASE']

self.port = port or os.environ['REDSHIFT_PORT']

Priority

This may only be important to me but it would make my code a lot cleaner. Thank you <3

Jason94 commented 2 years ago

I agree that this would be very convenient. If we use os.getenv we don't need to wrap in a try block, which could cause problems if some of the environmental vars aren't present.


self.username = username or os.getenv('REDSHIFT_USERNAME') or os.getenv('REDSHIFT_CREDENTIAL_USERNAME')

self.password = password or os.getenv('REDSHIFT_PASSWORD') or os.getenv('REDSHIFT_CREDENTIAL_PASSWORD') 

self.host = host or os.getenv('REDSHIFT_HOST')

self.db = db or os.getenv('REDSHIFT_DB') or os.getenv('REDSHIFT_DATABASE')

self.port = port or os.getenv('REDSHIFT_PORT')