m3dev / gokart

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.
https://gokart.readthedocs.io/en/latest/
MIT License
305 stars 57 forks source link

only use luigi-style ${ENVVAR} #272

Closed nersonu closed 2 years ago

nersonu commented 2 years ago

When I want to use the local stack, I try to specify 'endpoint_url' in the 's3' section, but it fails because boto3.resources() reads the environment variable.

The following is an example of a param.ini.

[s3]
endpoint_url=http://localhost:4566

When I run it in this state, it outputs an error as follows. (<ENV> is the environment variable.)

[2022/02/01 06:59:38][luigi-interface][ERROR](s3.py:141) resource() got an unexpected keyword argument '<ENV>'
[2022/02/01 06:59:38][luigi-interface][WARNING](worker.py:651) Will not run sample.Sample() or any dependencies due to error in complete() method:
Traceback (most recent call last):
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/luigi/contrib/s3.py", line 135, in s3
  self._s3 = boto3.resource('s3',
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/boto3/__init__.py", line 102, in resource
     return _get_default_session().resource(*args, **kwargs)
TypeError: resource() got an unexpected keyword argument '<ENV>'

This is because ConfigParser reads all environment variables with read_environ() in utils.py.

Suggestion

~I created this PR as the first idea for this measure. My proposed change overrides the Interpolation in luigi's ConfigParser and expands the environment variables denoted by %()s first. I abolished read_environ() and applied the BasicEnvironmentInterpolation to ConfigParser in set_environ_interpolation().~

Alternatively, I would suggest doing away with read_environ() and only using luigi's style${ENVVAR}. Please review!

Only Use lugi-style ${ENVVAR} (Feb. 10, 22)

I did away with read_environ() and only using luigi-style${ENVVAR}. I removed read_environ() and also remove check_config(). In check_config(), it looks like to me the function to check that the %(envvar)s style environment variable is not set.

Thanks to @Hi-king for agreeing to this suggestion :)

Hi-king commented 2 years ago

Alternatively, I would suggest doing away with read_environ() and only using luigi's style${ENVVAR}.

Yes, I personally, absolutely agree to your suggestion. But plz let me care about backward compatibilities a little 🙇

nersonu commented 2 years ago

Sorry. There was a bug in multiple variable expansion, so I have fixed it a bit (https://github.com/m3dev/gokart/pull/272/commits/9cc41dd8ee3c339de36743a4d3cafa911e4741b5).

Hi-king commented 2 years ago

@nersonu

I've carefully checked how read_environment() currently used in our company, and found more risks than benefits. Following is my surveys and thoughts.

https://docs.google.com/presentation/d/1Vq9EvpvTlwtMoZsOLrSljLencxx2xqfyNmlGb42SOXg/edit?usp=sharing

image image
image image

Finally, I strongly agree to your suggestion.

Alternatively, I would suggest doing away with read_environ() and only using luigi's style${ENVVAR}.

Could you change this PRs to "only using luigi's style${ENVVAR}" ? We will make new release version after this breaking changes update :)

nersonu commented 2 years ago

@Hi-king

Thanks for your survey and cool choices!

Could you change this PRs to "only using luigi's style${ENVVAR}" ? We will make new release version after this breaking changes update :)

I'd like to do it. 👍 I'm going to remove read_environ() and also remove check_config() and commit, is there a problem? In check_config(), it looks like to me the function to check that the %(envvar)s style environment variable is not set. https://github.com/m3dev/gokart/blob/9524265d9b6c266b9ca49d440fb4ca3bfb3e3acd/gokart/utils.py#L14-L20

Hi-king commented 2 years ago

@nersonu I've merged this PR. Thx again for your careful debugging :)

hirosassa commented 2 years ago

@nersonu Thank you for your GREAT contribution!