kubeflow / kfp-tekton

Kubeflow Pipelines on Tekton
https://developer.ibm.com/blogs/kubeflow-pipelines-with-tekton-and-watson/
Apache License 2.0
171 stars 121 forks source link

Tekton results with sidecar/post-processing step investigation #878

Open Tomcli opened 2 years ago

Tomcli commented 2 years ago

/kind feature

Description: We want to investigate and create a simple POC to capture results using either sidecar or a post-processing step. The sidecar/post-processing step will do the following.

  1. Look for the result file path at the end of all the step containers, then copy them into object storages/remote server that can be referred back by the core Tekton controller.
  2. Create a new API field for artifacts. It works the same as results, except that artifact sizes can be bigger than 1MB which Tekton controller cannot stringify the artifact object as raw input parameters

related: https://github.com/tektoncd/community/pull/521

Additional information: [Miscellaneous information that will assist in solving the issue.]

Tomcli commented 2 years ago

/assign @ScrapCodes

Tomcli commented 2 years ago

The current kfp-tekton copy artifact container is generated with this script:

              #!/usr/bin/env sh
              push_artifact() {
                  if [ -f "$2" ]; then
                      tar -cvzf $1.tgz $2
                      mc cp $1.tgz storage/$ARTIFACT_BUCKET/artifacts/$PIPELINERUN/$PIPELINETASK/$1.tgz
                  else
                      echo "$2 file does not exist. Skip artifact tracking for $1"
                  fi
              }
              push_log() {
                  cat /var/log/containers/$PODNAME*$NAMESPACE*step-main*.log > step-main.log
                  push_artifact main-log step-main.log
              }
              strip_eof() {
                  if [ -f "$2" ]; then
                      awk 'NF' $2 | head -c -1 > $1_temp_save && cp $1_temp_save $2
                  fi
              }
              mc config host add storage ${ARTIFACT_ENDPOINT_SCHEME}${ARTIFACT_ENDPOINT} $AWS_ACCESS_KEY_ID $AWS_SECRET_ACCESS_KEY
              push_artifact data $(results.data.path)
Tomcli commented 2 years ago

The result file path should be available in the Tekton core controller as a variable $(results.resultname.path). We just need a lightweight container that pushes these results to some object storages like S3 or a remote server with HTTP (based on config). Then the Tekton controller can have some logics to retrieve these files in the remote server location.

Tomcli commented 2 years ago

@ScrapCodes Here is the possible sidecar issue https://github.com/tektoncd/pipeline/issues/1347

Tekton's current Sidecar implementation contains a bug. Tekton uses a container image named nop to terminate Sidecars. That image is configured by passing a flag to the Tekton controller. If the configured nop image contains the exact command the Sidecar was executing before receiving a "stop" signal, the Sidecar keeps running, eventually causing the TaskRun to time out with an error.

yhwang commented 2 years ago

@ScrapCodes Prashant, any update for this issue?

ScrapCodes commented 2 years ago

There may be a race between step container terminating and sidecar copying results to remote server.

E.g. If there is only one step, tekton will try to terminate the sidecar as soon as the step completes. Now, there is a chance that results may not get copied in that time. This also applies to the last step.

ScrapCodes commented 2 years ago

A preStop hooks is not called if the container is completed i.e. ran to completion. Link

Tomcli commented 2 years ago

Thanks @ScrapCodes for your feedback on the sidecar, do you think having a post-processing step can avoid these issues?