Closed MarcelBeining closed 2 days ago
Hi @MarcelBeining. Thanks for raising this! I think it makes a lot of sense. Would you be interested in submitting a pull request for this? If not, the Kedro maintainers could consider adding it.
Hi @DimedS. I guess it had some reasons why kedro maintainers designed it as it is now. So before I trial-and-error different implementations until it suits the (to me unknown) design principles, I'd rather suggest the kedro maintainers should add it :-)
Can you explains what arguments do you need? Maybe I don't understand the question, isn't this available in hooks?
Sure, there are two use cases:
Sure, there are two use cases:
- We need parameters from the correct parameter.yml in settings.py to fill in configurable email details (sender, recipient etc.) to an EmailNotifier hook (https://gitlab.com/anacision/kedro-expectations#notification). But in settings.py it is currently not possible to get the correct kedro parameters at it is not possible to find out which environment argument is used for the current run.
I don't have a clear cut answer on how to fix this, but what you're trying to do here does go against the flow of execution for Kedro. settings.py
is used to instantiate all components needed for a functioning Kedro project pre running it. It's not meant to contain knowledge about the runtime variables. The architecture diagram might help illustrate how components are designed to interact with each other:
https://docs.kedro.org/en/stable/extend_kedro/architecture_overview.html
- Depending on the environment, some pipelines should not be available (i.e. build together in pipeline_registry.py) to avoid executing critical code in production. Here we also would know in pipeline_registry.py what env argument kedro is run with. This is even before "before_pipeline_run" so no possibility to get the env argument from there. And even if, it would be kind of hacky as one would have to use a global variable and needs an extra hook that fills it.
For this second case, can't you use namespaces to filter what pipelines should be executed? https://docs.kedro.org/en/stable/nodes_and_pipelines/namespaces.html
Closing this due to inactivity. Feel free to re-open this to continue the conversation!
Description
We use kedro pipelines alot for our AI projects and we stumble so often over this problem, that it is time to make an issue about it. We regularly pass arguments like the desired environment as a run argument to kedro run. We also need custom functionalities that we implement into settings.py (e.g. custom hooks) and pipeline_registry.py (e.g. custom pipeline combination). For these functionalities we sometimes need extra information, such as the environment we are running.
There is no simple and robust way to access run arguments in these functions! Possible solutions that have been suggested and tested by us so far:
sys.argv
ourselves: that seems kind of error-prone if the env is handed over in some other way (e.g.KEDRO_ENV
)get_current_session()
was deprecated in 0.18 and it seems completely impossible now to access session object in deeper functions.OmegaConfigLoader
, which requires... guess what: defining the env :-Dafter_context_created
, save the env information from there in a global class/variable and use it: That seems very hacky and works only forpipeline_registry.py
, not forsettings.py
as that is called beforeafter_context_created
The same problem one has of course if trying to access any parameter from
parameters.yml
in these higher level files.Context
This should be important for anyone, who extends kedro pipeline functionality above its standard use.
Possible Implementation
Simply make it possibly to import and access the kedro context or session object (at least in some frozen, read-only state) from anywhere!