treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.31k stars 221 forks source link

_env cannot be set for multiple tasks #937

Open friendofasquid opened 5 years ago

friendofasquid commented 5 years ago

Scenario

I am using the mysql command line executable plus the sh operator to perform various MySQL operations in DigDag. mysql automatically checks for MYSQL_PWD in the environment for the password to use. I am trying to set that environment variable with a Digdag secret.

I'd like to set that once and have the environment set up for child tasks.

Complications

Root _env

Setting an _env at the .dig root gives an error.

_env:
  MYSQL_PWD: ${secret:mysql.password}

gives error:

Workflow 'td_to_mysql' includes unknown keys: [_env] (model validation)

Parent Task _env

+many_mysql_comments:
  _env:
    MYSQL_PWD: ${secret:mysql.password}
  +command_1:
    sh>: mysql --execture "${SQL}"
  +command_2:
    sh>: mysql --execture "${MORE_SQL}"

gives the error +td_to_mysql+many_mysql_comments contains invalid keys: '_env': "{"_env":{"MYSQL_PWD":"${secret:mysql.password}"}}" (config)

Question

What is the best way to set up the environment so multiple tasks can use it? Thanks!

yoyama commented 5 years ago

I guess you can access secrets from sh> operator as follows

+many_mysql_comments:
  +command_1:
    sh>: MYSQL_PWD=${secret:mysql.password} mysql --execture "${SQL}"
  +command_2:
    sh>: MYSQL_PWD=${secret:mysql.password} mysql --execture "${MORE_SQL}"
friendofasquid commented 5 years ago

The would work but would emit the password in the logs. It's a little more secure and only slightly more verbose using _env in each task.

You can also use the --password argument for mysql with similar security concerns.

hiroyuki-sato commented 5 years ago

Hello, @friendofasquid

I'm thinking about another solution. I'll let you know if I find it.

BTW, I'm maintaining mysql> operator. It uses secrets so You don't need to set MYSQL_PWD environment variables.

sonots commented 5 years ago

I am also suffering with _env. I want to configure _env globally.

My scenario is like:

  1. Set _export: docker: globally so that I can run all tasks in a docker container.
  2. Want to get secrets in docker-entrypoint.sh from environment variables so that I can avoid embed secrets into docker images.
  3. It requires me to map all secrets to environment variables using _env for all tasks.

If it is possible to configure _env globally with a giant task or with a global _export, my suffering should be relieved.

FredericoCoelhoNunes commented 3 years ago

I also need to configure _env globally... I have both my dev and prod projects running in the same server for cost-saving, which seems like a pretty common use-case, but unless I'm missing something there isn't a straight-forward way to do this.

So far it has been working because I've been setting the environment as a project secret, for the dev and prod projects, but this is obviously not the intended use of secrets, it's just a hack. And now everything breaks when I try to add test DAGs because these aren't part of a project, but rather standalone DAGs, so they don't have access to secrets.