Is dataset caching persistent across runs?

takikadiri / kedro-boot

A kedro plugin that streamlines the integration between Kedro projects and third-party applications, making it easier for you to develop end-to-end production-ready data science applications.

Apache License 2.0

33 stars 6 forks source link

Hi charlesbmi, i'm glad you liked the project !

Yes the cached dataset is persisted across runs. kedro-boot cache/preload some datastes as MemoryDataset in order to speedup the runs and achieve low latency. The process of preparing the catalog for multiple runs process is called catalog compilation. You can dry run the compilation process with kedro boot compile --pipeline your_pipeline, the list of artifact datasets that would be cached are described in the compilation report.

In your use case, you would have a thin application that inject some parameters into your pipelines, kedro-boot would preload all other datasets as MemoryDataset as they are not changed between runs.

Let us know if it's worked for you.

takikadiri / kedro-boot

Is dataset caching persistent across runs? #32