Closed cloudbow closed 4 months ago
Hi @cloudbow, I tried running your example on KFP standalone 1.6.0 on GCP, it is cached as expected after second run.
Therefore, this issue might be specific to either your env or MiniKF. Can you use kubectl get pod -n kubeflow
kubectl describe pod <pod-name>
kubectl logs <pod-name>
etc techniques to check your deployment. The key servers to look at is cache-server
and cache-deployer
. Sth might be failing with them.
Hi @Bobgy , I have the same issue with a different environment so I might provide some informations.
I tried with KFP 1.6/1.5/1.4 standalone on GCP and AI Platform deployed on an existing cluster and never get caching. All deployment are fine including cache-server and cache-deployer but the cachedb is always empty (other db seems fine).
With AI Platform and a new cluster I actually have cache so it may come from the environment ? I tried with private clusters with autoscaling (min 3 nodes, 2vCPU and 4.5GB RAM).
Edit: After further investigations, this seems to be related to the use of a private cluster in my case.
@Itega what do you mean by private cluster ? I am running kubeflow on minikf 1.3 from market place. can this also be called private cluster? @Bobgy let me check
ubuntu@ip-10-101-8-247:~$ kubectl get po -n kubeflow NAME READY STATUS RESTARTS AGE admission-webhook-deployment-8c9cdf478-q2lmt 2/2 Running 0 7d19h centraldashboard-77cb6bbb48-nktsx 2/2 Running 0 7d19h jupyter-web-app-deployment-75795878-ts9t2 2/2 Running 0 7d19h katib-controller-6d6bb5495d-zc29z 2/2 Running 0 7d19h katib-db-manager-6ff648f5cc-r5mgc 2/2 Running 0 7d19h katib-mysql-6495dccdd5-vpffx 2/2 Running 0 7d19h katib-ui-7ddf4965f9-j49ss 2/2 Running 0 7d19h kfp-cache-7fd4488b7f-t2kcn 3/3 Running 0 7d19h kfserving-controller-manager-0 3/3 Running 0 7d19h kubeflow-reception-7895dd4d69-lxlss 2/2 Running 0 7d19h metadata-db-6bf8b57f97-jqg29 2/2 Running 0 7d19h metadata-envoy-deployment-549d875989-r4kk8 1/1 Running 0 7d19h metadata-grpc-deployment-ccc8c8bd9-rw2xz 2/2 Running 4 7d19h minio-6cfd7cb4f-25zkp 2/2 Running 0 7d19h ml-pipeline-5dc8fff45b-nj76p 2/2 Running 0 7d19h ml-pipeline-persistenceagent-c6b4d475f-hmnwt 2/2 Running 0 7d19h ml-pipeline-scheduledworkflow-64dc954c6c-tzp4x 2/2 Running 0 7d19h ml-pipeline-ui-78846f6754-tmnth 2/2 Running 1 7d19h ml-pipeline-viewer-crd-5ffbd79f68-dx667 2/2 Running 0 7d19h ml-pipeline-visualizationserver-5977df9c45-6xq5x 2/2 Running 0 7d19h models-web-app-7bfdc5c585-rznv7 2/2 Running 0 7d19h mpi-operator-754d876fd8-gppnx 1/1 Running 1 7d19h mxnet-operator-c5f7b6798-gzmxv 1/1 Running 1 7d19h mysql-65ff8d5dfd-wqbbd 2/2 Running 0 7d19h notebook-controller-deployment-7c46fdd957-f957p 2/2 Running 0 7d19h profiles-deployment-588f5fdcf8-26xmv 3/3 Running 0 7d19h pvcviewer-controller-controller-manager-5998dc798b-jx2hf 3/3 Running 1 7d19h pytorch-operator-77b7ff46c-hhfhj 2/2 Running 1 7d19h spark-operatorsparkoperator-579554d99d-mnkz2 2/2 Running 0 7d19h tensorboard-controller-controller-manager-6d99664986-n624x 3/3 Running 1 7d19h tensorboards-web-app-deployment-6b98985bc5-xv6rv 1/1 Running 0 7d19h tf-job-operator-5bb7675fb8-4nfhq 2/2 Running 1 7d19h volumes-web-app-deployment-b8d6cc797-xwdmz 2/2 Running 0 7d19h workflow-controller-5f9dbb559c-dw2tk 2/2 Running 0 7d19h xgboost-operator-deployment-7bf56c6d4f-cf7jc 2/2 Running 0 7d19h
I see only kfp cache . I did try turning on logs and running but here is what I got.
First run {"level":"info","ts":1622631681.9666345,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631681.9666553,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631681.9760761,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631681.9761019,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631682.023542,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631682.0235684,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631684.0169182,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631684.0169406,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631685.1026225,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631685.1026473,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631685.4869857,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631685.487011,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631686.1834323,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631686.1834562,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631692.0271308,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631692.0308237,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631692.0488806,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631692.048913,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631692.065883,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631692.0659041,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2880583143"} {"level":"info","ts":1622631692.110993,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631692.1110253,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631693.9256918,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631693.9257143,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631695.0097864,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631695.009808,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631695.2612553,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631695.2612772,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631696.0857518,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631696.0857756,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631702.0451055,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"} {"level":"info","ts":1622631702.045128,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-pcbfb-2404255012"}
Next run {"level":"info","ts":1622631768.5726974,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631768.572721,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631768.5772638,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631768.5772843,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631768.6517446,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631768.6517673,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631769.8138227,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631769.8138506,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631770.8717377,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631770.8717608,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631771.2016962,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631771.2017179,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631771.949174,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631771.9491968,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631778.463374,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-mvkk7-3840758880"} {"level":"info","ts":1622631778.4633968,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-mvkk7-3840758880"} {"level":"info","ts":1622631778.4745355,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-mvkk7-3827168991"} {"level":"info","ts":1622631778.4745579,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-mvkk7-3827168991"} {"level":"info","ts":1622631778.4745853,"logger":"kfp-cache-controller","msg":"Pod does not exist","pod":"kubeflow-user/addition-pipeline-mvkk7-3840758880"} {"level":"info","ts":1622631778.4823828,"logger":"kfp-cache-controller","msg":"Pod does not exist","pod":"kubeflow-user/addition-pipeline-mvkk7-3827168991"} {"level":"info","ts":1622631778.7625508,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631778.7625754,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631778.7830899,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631778.783113,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631778.7965403,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631778.7965593,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-687887970"} {"level":"info","ts":1622631778.8399537,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631778.8399792,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631780.6956744,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631780.695712,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631781.8212683,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631781.8212938,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631782.109519,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631782.1095474,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631782.8937457,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631782.893769,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631788.7896814,"logger":"kfp-cache-controller","msg":"Successfully retrieved pod","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"} {"level":"info","ts":1622631788.7897058,"logger":"kfp-cache-controller","msg":"Pod is not a Kale step","pod":"kubeflow-user/addition-pipeline-wshhv-1927489725"}
/kind question /priority p2 /area pipelines
@jbottum: The label(s) area/pipeliines
cannot be applied, because the repository doesn't have them.
It seems KFP cache deployer is missing in minikf. @yanniszark @elikatsis who is the best person to ask about minikf?
Hi all!
It's true, we don't deploy the official KFP cache in MiniKF, for a few reasons:
By the way, our caching mechanism is deployed in the kubeflow
namespace as the kfp-cache
deployment, and that's what the logs above are about.
cc @StefanoFioravanzo
@elikatsis shall we document this on MiniKF side? Can we close the issue now?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Closing this issue, no activity for more than a year. If this issue persists in the latest release, please open a new issue.
/close
@rimolive: Closing this issue.
Environment
kfp 1.5.0 kfp-pipeline-spec 0.1.7 kfp-server-api 1.5.0
Steps to reproduce
I have attached the notebook I used. Please try it with that . The input is already provided. Its a simple add_op pipeline which adds two numbers. But why is the step being executed again and again even if the run is cloned or a new run created using the same pipeline.
Expected result
The steps should have been cached as the input the docker image, the output everything is same.
Materials and Reference
Attached sample [code](simple_function_based_component_pipeline (1).ipynb.zip)
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.