kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
676 stars 111 forks source link

[Debugging] Show which datasets are outdated #1704

Open francisduval opened 9 months ago

francisduval commented 9 months ago

Description

When running kedro viz run, there is no way to know which datasets are up to date and which ones are outdated. A dataset is said to be outdated if the code upstream has changed since the dataset was run for the last time. This feature exists with the Targets package in R. Also, when you run the targets pipeline, only nodes that are outdated are run, which saves computing time.

Context

This could be a nice feature since without it, there is no effective way to tell which parts of the pipeline you should rerun when changes have been made to the code. Sometimes, you are unsure if a dataset is up to date or not, and then you have to rerun it to be sure, which can take a long time.

Possible Implementation

Color datasets that are outdated with another color. Also, it would be nice to have a kedro command that would only run outdated datasets, such as kedro run --only_outdated or kedro run --pipeline pipeline_name --only_outdated.

Checklist

astrojuanlu commented 9 months ago

Somewhat related: https://github.com/kedro-org/kedro/issues/221, https://github.com/kedro-org/kedro/issues/2307

NeroOkwa commented 6 months ago

Backlog grooming notes:

This was also highlighted in #1750, and would build on the dataset preview and debugging work stream. We should consider implementing this. Next step - investigation of technical feasibility.