kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.53k stars 877 forks source link

`kedro run --params key=1,2,3` Kedro CLI does not take list as argument #2552

Open noklam opened 1 year ago

noklam commented 1 year ago

Description

Short description of the problem here. Some kedro cli like kedro run --tags accept argument with comma-separated list. This is not true with the --params as , get parsed as a separate parameters.

kedro run --params error:1,2 or kedro run --params error:[1,2] won't work`

You will get some error like this.

Error: Invalid format of `params` option: Item `b` must contain a key and a value separated by `:` or `=`.
(kedro_core) soft-runner main % kedro run --params error:a,b

The reason that I need this is I want to create my hooks, which takes CLI arguments and execute custom logic. The --params is the most intuitive argument I can find, and I can get it from before_pipeline_run's run_params.

The alternative is to add my custom CLI flag. This has 2 disadvantage.

  1. I need the cli.py
  2. I cannot get it from the hooks spec because run_params is hard-coded This creates a tight couple between my cli file and my hooks. Using params does not requires any of this, and the CLI already handle all the parsing (with OmegaConf) for me.

Context

How has this bug affected you? What were you trying to accomplish? For some reason, I remember this works in older versions (??) or maybe the only way is to use the --config flag but that flag wasn't well used either. I want my hooks reading the CLI, one of the easiest way is to use the --params. Even if I want to parse , myself, the CLI will not allow it. My alternative is to use other delimeter like &

Steps to Reproduce

kedro run --params error:a=1,2 should return a dictionary like {"a": [1,2]}

Expected Result

Tell us what should happen. Maybe it never worked before, or the documentation is missing.

Actual Result

Tell us what happens instead. I get an error

-- If you received an error, place it here.
-- Separate them if you have more than one.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

antonymilne commented 1 year ago

Just a few quick comments, without trying it out myself...

Just to clarify: even though we should delegate to OmegaConf.from_dotlist for this, it shouldn't be tried to the config loader in any way. We're just using their syntax for parsing params from the CLI.

marrrcin commented 10 months ago

@antonymilne what if indeed this (parsing the --params) was offloaded to the config loader class 🤔? That would actually make this really customizable with sensible default.

FYI, in our plugins we're actually using JSON format to pass the params, e.g. https://github.com/getindata/kedro-azureml/blob/5ae7bcf430ec0a4824b6a198c4e919ae444e2589/kedro_azureml/cli.py#L236

noklam commented 10 months ago

That would actually make this really customizable with sensible default.

What do you mean? The proposal does not requires using ConfigLoader, it only use the OmegaConf cli parse feature. What extra feature is enabled to use config loader?

marrrcin commented 10 months ago

What extra feature is enabled to use config loader?

Someone could e.g. implement custom parsing of CLI args - like the JSON example.

noklam commented 8 months ago

A quick attempt as I come across some question in Slack today.

Originally I want to take regex approach https://stackoverflow.com/questions/26633452/how-to-split-by-commas-that-are-not-within-parentheses but without success.

https://github.com/kedro-org/kedro/pull/3327 - this is not ready to be merged. This only attempt to support [] but not {}, I found it quite difficult to parse string correctly so I limit the scope for brackets only. On the other hand, list is more CLI friendly where typing out the entire dictionary in CLI is overcomplicated.

The key is try to parse the comma correctly, that is, ignoring comma inside [] () {}, the remaining part will basically just work because OmegaConf.from_dotlist handles list and dict by default.

marrrcin commented 3 months ago

What's the status of this? 🤔

noklam commented 3 months ago

It was never put into a sprint, I made a draft PR to see what it takes to add just list as an example.

It requires building a parser basically, though in this case maybe we can just add list and forget about dict and everything.

astrojuanlu commented 3 months ago

Is there some prior art of CLI tools accepting Python primitive non-scalar types on the CLI?

datajoely commented 1 month ago

Some overlap with this recent discussion https://github.com/kedro-org/kedro/pull/3904#issuecomment-2142457119