kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.88k stars 893 forks source link

Expand paramenter dictionaries in node functions declarations #3818

Open astrojuanlu opened 5 months ago

astrojuanlu commented 5 months ago

Description

To clarify, this is more about going from

def split_data(data: pd.DataFrame, parameters: dict[str, Any]) -> Tuple:
    X = data[parameters["features"]]
    y = data["price"]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=parameters["test_size"], random_state=parameters["random_state"]
    )
    ...

            node(
                func=split_data,
                inputs=["model_input_table", "params:model_options"],

to

def split_data(data: pd.DataFrame, features: list[str], test_size: float, random_state: int) -> Tuple:
    X = data[features]
    y = data["price"]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=random_state
    )
    ...

            node(
                func=split_data,
                inputs=["model_input_table", "params:model_options.features", "params:model_options.test_size", "params:model_options.random_state"],

does it make sense?

_Originally posted by @astrojuanlu in https://github.com/kedro-org/kedro/pull/3782#discussion_r1562471991_

Documentation page (if applicable)

The task is to find all occurrences in documentation and starters.

Context

datajoely commented 5 months ago

100000000% I would even go as far as hiding parameters from the docs - explicit is better than implicit