opendatahub-io / ai-edge

ODH integration with AI at the Edge usecases
Apache License 2.0
11 stars 19 forks source link

[BUG]: Creating the two PipelineRuns makes one or both fail #175

Open adelton opened 1 year ago

adelton commented 1 year ago

Details

Describe the bug

When the user just pastes

oc create -f tekton/build-container-image-pipeline/aws-env-real.yaml
oc apply -k tekton/build-container-image-pipeline/
oc create -f tekton/build-container-image-pipeline/build-container-image-pipelinerun-bike-rentals.yaml
oc create -f tekton/build-container-image-pipeline/build-container-image-pipelinerun-tensorflow-housing.yaml

on a fresh namespace, there are high chances that

It's likely caused by a race condition over that new static model_dir part of the path introduced by https://github.com/opendatahub-io/ai-edge/pull/112 which the two pods are fighting for.

To Reproduce

Paste the commands all in one and then check the PipelineRuns in console.

Alternatively, do

oc create -f tekton/build-container-image-pipeline/aws-env-real.yaml
oc apply -k tekton/build-container-image-pipeline/
oc create -f tekton/build-container-image-pipeline/build-container-image-pipelinerun-tensorflow-housing.yaml

watch the first TaskRun (kserve-download-model) turn green in the console, then paste

oc create -f tekton/build-container-image-pipeline/build-container-image-pipelinerun-bike-rentals.yaml

and observe both the PipelineRuns in the console.

Expected behavior

Running the two PipelineRuns in parallel should still be possible the way it worked before, even if now one is git-based and the other one S3-based.

Screenshots (if applicable)

LaVLaS commented 11 months ago

@adelton The quick fix for this right now is to migrate your PipelineRun to use volumeClaimTemplate instead of a single pre-existing PVC for all Pipelines.

As we combine and refactor the pipelines (#177) to improve the workflow, we can add optional support for a collection of Pipelines to utilize a single PVC to archive data with support for purging older logs

adelton commented 11 months ago

@adelton The quick fix for this right now is to migrate your PipelineRun to use volumeClaimTemplate instead of a single pre-existing PVC for all Pipelines.

Could you do a PR so that we fix the repo content for everyone and folks don't need to do a one-off changes?