This is a somewhat long pipeline and some of the steps can take a significant amount of time. As a developer, I want to be able to work on and test different parts of the pipeline without waiting on all previous steps to complete first. We currently have an approach of using "faked" pipeline steps with the same names but different functionality. While developing, I've found that even with the faked steps moving data around can take a long time and hinder development speed.
I propose that we generate a long lasting PVC on our cluster that is pre-loaded with all of the interim datasets and models needed throughout the pipeline so that we can easily skip directly to whichever step is relevant for a developer without waiting on any previous data passing steps.
This is a somewhat long pipeline and some of the steps can take a significant amount of time. As a developer, I want to be able to work on and test different parts of the pipeline without waiting on all previous steps to complete first. We currently have an approach of using "faked" pipeline steps with the same names but different functionality. While developing, I've found that even with the faked steps moving data around can take a long time and hinder development speed.
I propose that we generate a long lasting PVC on our cluster that is pre-loaded with all of the interim datasets and models needed throughout the pipeline so that we can easily skip directly to whichever step is relevant for a developer without waiting on any previous data passing steps.