treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.3k stars 221 forks source link

Use EnvironmentFile instead of Environment #1762

Closed rariyama closed 1 year ago

rariyama commented 1 year ago

Overview

fix https://github.com/treasure-data/digdag/issues/1749

I made a change that environment file is used when passing the values of environment to the container instead of environment, that is the way to add them to the container one by one. This change will be effective to avoid failing an execution of ECS Run Task because a maximum size of characters of ECS container overrides is 8192 and this limit includes environments defined by the current implementation. it will be helpful especially when executing sh>: operator. In addition to variables defined in _env, variables defined in _export are also passed to the container unlike py>: and rb>: so the size of environment tends to be larger as the development progresses.

The change points

rariyama commented 1 year ago

I added the parameter for choosing the existing way in this commit https://github.com/treasure-data/digdag/pull/1762/commits/27d1ae3340fc00afa7b8e21bc6a6c62f8c31a268. Users can select the either way for passing environments to a container by setting it. The default value is false, so there will be no impact for existing users. If some users want to use environment file on most of tasks in workflow, they can use it easily by setting the parameter on the root task and override it on child ones as necessary.

yoyama commented 1 year ago

Sorry for very late reply. I don't check it in detail, but I have a few concerns.

rariyama commented 1 year ago

@yoyama Thanks for your reply.

I think providing an option to use file to pass environment variables is good and default is false is also good. But I think it is better to be configured by server properties because it is internal behavior.

It makes sense to me, so I modified my implementation with this commit https://github.com/treasure-data/digdag/pull/1762/commits/fe6f8bfca83ee15eaeee01dbff5b437562e9a7b9. I left setEcsContainerOverrideEnvironment and added a process about using environmentFiles to it.

Your change increases the code line of the method. I hope refactoring on your change.

I also dealt with this concern with the commit above. I extracted a process about uploading environmentFile to S3 as another method from setEcsContainerOverrideEnvironment.

If there are any other concerns, please let me know about those.

rariyama commented 1 year ago

@szyn Thanks for your review. I fixed the problems you commented so please confirm it when you are available.