0.19.9 introduced error, output is saved only once when running the pipeline

kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

https://kedro.org

Apache License 2.0

10.03k stars 906 forks source link

0.19.9 introduced error, output is saved only once when running the pipeline #4235

Closed adobrogows-ki closed 1 month ago

adobrogows-ki commented 1 month ago

Description

output is saved only once when running the pipeline, testing with sequential runner

Context

fails testing where we need to run a pipeline many times and our server that relies on re-running pipeline many times

Steps to Reproduce

make new project with kedro new and sample code

modify the test case test_data_science_pipeline to end with:

a = SequentialRunner().run(pipeline, catalog)
b = SequentialRunner().run(pipeline, catalog)

assert a == b
assert successful_run_msg in caplog.text

run the test

Expected Result

a == b

Actual Result

b is empty

Your Environment

Kedro version used 0.19.9

Calychas commented 1 month ago

https://kedro-org.slack.com/archives/C03RKP2LW64/p1729156896234549

astrojuanlu commented 1 month ago

Adding the public link instead for archival purposes https://kedro.hall.community/support-lY6wDVhxGXNY/second-runner-run-fails-to-save-output-after-pipeline-upgrade-V5X74llZ8xD1

noklam commented 1 month ago

I cannot reproduce the issue following the description, I can produce the same error with following:

def test_data_science_pipeline(caplog, dummy_data, dummy_parameters):

    pipeline = (
        create_ds_pipeline()
        .from_nodes("split_data_node")
        .to_nodes("train_model_node")
    )
    catalog = DataCatalog()
    catalog.add_feed_dict(
        {
            "model_input_table" : dummy_data,
            "params:model_options": dummy_parameters["model_options"],
        }
    )

    a = SequentialRunner().run(pipeline, catalog)
    b = SequentialRunner().run(pipeline, catalog)
    assert a == b

ElenaKhaustova commented 1 month ago

Solved in https://github.com/kedro-org/kedro/pull/4236