singer-io / singer-python

Writes the Singer format from Python
https://singer.io
Apache License 2.0
537 stars 129 forks source link

Support for inline configuration strings #109

Open dcereijodo opened 4 years ago

dcereijodo commented 4 years ago

Sometimes when productionalizing taps and targets executions it gets inconvenient having to rely on an actual configuration file stored in the system, and instead it would be much easier to be able to pass such config as a JSON string in the command line parameters. So something like this

tap-mysql --config '{"host": "mysql-host.com", "port": "3306", "user": "$USR_PROD", "password": "$PWD_PROD"}'

One simple hack for supporting that would be changing this line to something like this

def load_json(path):
  try:
      inline_config = json.loads(myjson)
  except ValueError as e:
    with open(path) as fil:
      return json.load(fil)
  return inline_config
luandy64 commented 4 years ago

Couldn't you do:

echo '{"host": "mysql-host.com", \                                                                                                                                                                                   
       "port": "3306", \                                                                                                                                                                                             
       "user": "$USR_PROD", \                                                                                                                                                                                        
       "password": "$PWD_PROD"}'                                                                                                                                                                                     
     > /tmp/my_config; \                                                                                                                                                                                             
tap-mysql --config /tmp/my_config
dcereijodo commented 4 years ago

I totally can (at least in my use case). My suggestion was more about the convenience of it. When I have to do that inside an Airflow BashOperator for example, it starts looking odd

BashOperator(
  task_id='syn_animals',
  bash_command="""
    echo '{"host": "mysql-host.com", \                                                                                                                                                                                   
       "port": "3306", \                                                                                                                                                                                             
       "user": "$USR_PROD", \                                                                                                                                                                                        
       "password": "$PWD_PROD"}'                                                                                                                                                                                     
     > /tmp/my_config;                                                                                                                                          
   tap-mysql --config /tmp/my_config
  """
)

Also I might not have a good place (or a safe place) to put that temporal config file.

In this custom tap I experimented with an optional parameter --overrides which accepts a JSON string that gets merged into a default (or empty otherwise) configuration file. That would also work here.

BashOperator(
  task_id='syn_animals',
  bash_command="""
    tap-mysql --overrides '{"host": "mysql-host.com", "port": "3306", "user": "$USR_PROD", "password": "$PWD_PROD"}'
  """
)

But again it's just a matter of convenience :)

luandy64 commented 4 years ago

Understood, feel free to open a PR for the feature 👍

On another note, how do you handle state files? Or do you not use those?

dcereijodo commented 4 years ago

Great. Will do that :)

No, I do not need state files. But if I were to implement that, what's wrong with using the same approach? So

BashOperator(
  task_id='syn_animals',
  bash_command="""
    tap-mysql \
      --config ~/.singer.io/tap_mysql_defaults.json \ # staging and prod configuration is identical, only the host changes
      --overrides '{"host": "$MYSQL_HOST"}' \
      --state '{"type": "STATE",  "value": {"bookmarks": {"db-animals": {"version": 1509135204169, "replication_key_value": "{{ ts }}", "replication_key": "updated_at"}}, "currently_syncing": null}}' \
    | target-redshift ...
  """
)
luandy64 commented 4 years ago

Nothing wrong with it 😄 It occurred to me that either you are not using state at all or we would have to also implement this for state

dcereijodo commented 4 years ago

Right. I sent a PR that implements this behavior in the load_json function from the utils, so it should work for config, state and properties arguments.