Closed pablofiumara closed 8 months ago
Have you checked that both ml-pipeline-ui
deployment in kubeflow
namespace and the ml-pipeline-ui-artifact
deployment in user namespaces are all using ml-pipeline/frontend:1.8.1
?
@zijianjoy Yes, I have
Name: ml-pipeline-ui
Namespace: kubeflow
CreationTimestamp: Wed, 23 Jun 2021 21:52:54 -0300
Labels: app=ml-pipeline-ui
app.kubernetes.io/component=ml-pipeline
app.kubernetes.io/name=kubeflow-pipelines
Annotations: deployment.kubernetes.io/revision: 21
Selector: app=ml-pipeline-ui,app.kubernetes.io/component=ml-pipeline,app.kubernetes.io/name=kubeflow-pipelines
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=ml-pipeline-ui
app.kubernetes.io/component=ml-pipeline
app.kubernetes.io/name=kubeflow-pipelines
Annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: true
kubectl.kubernetes.io/restartedAt: 2022-08-25T18:19:01-03:00
Service Account: ml-pipeline-ui
Containers:
ml-pipeline-ui:
Image: gcr.io/ml-pipeline/frontend:1.8.1
Port: 3000/TCP
Host Port: 0/TCP
Requests:
cpu: 10m
memory: 70Mi
Liveness: exec [wget -q -S -O - http://localhost:3000/apis/v1beta1/healthz] delay=3s timeout=2s period=5s #success=1 #failure=3
Readiness: exec [wget -q -S -O - http://localhost:3000/apis/v1beta1/healthz] delay=3s timeout=2s period=5s #success=1 #failure=3
Environment:
KUBEFLOW_USERID_HEADER: <set to the key 'userid-header' of config map 'kubeflow-config'> Optional: false
KUBEFLOW_USERID_PREFIX: <set to the key 'userid-prefix' of config map 'kubeflow-config'> Optional: false
VIEWER_TENSORBOARD_POD_TEMPLATE_SPEC_PATH: /etc/config/viewer-pod-template.json
DEPLOYMENT: KUBEFLOW
ARTIFACTS_SERVICE_PROXY_NAME: ml-pipeline-ui-artifact
ARTIFACTS_SERVICE_PROXY_PORT: 80
ARTIFACTS_SERVICE_PROXY_ENABLED: true
ENABLE_AUTHZ: true
MINIO_NAMESPACE: (v1:metadata.namespace)
MINIO_ACCESS_KEY: <set to the key 'accesskey' in secret 'mlpipeline-minio-artifact'> Optional: false
MINIO_SECRET_KEY: <set to the key 'secretkey' in secret 'mlpipeline-minio-artifact'> Optional: false
ALLOW_CUSTOM_VISUALIZATIONS: true
Mounts:
/etc/config from config-volume (ro)
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: ml-pipeline-ui-configmap
Optional: false
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: ml-pipeline-ui-oneId (1/1 replicas created)
Events: <none>
Name: ml-pipeline-ui-artifact
Namespace: myNamespace
CreationTimestamp: Mon, 13 Jun 2022 17:20:27 -0300
Labels: app=ml-pipeline-ui-artifact
controller-uid=34641e66-4d49-4025-b235-fc433a8e2049
Annotations: deployment.kubernetes.io/revision: 4
metacontroller.k8s.io/last-applied-configuration:
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"labels":{"app":"ml-pipeline-ui-artifact","controller-uid":"34641e66-4d49-4025-b23...
Selector: app=ml-pipeline-ui-artifact
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=ml-pipeline-ui-artifact
Annotations: kubectl.kubernetes.io/restartedAt: 2022-08-23T18:23:11-03:00
Service Account: default-editor
Containers:
ml-pipeline-ui-artifact:
Image: gcr.io/ml-pipeline/frontend:1.8.1
Port: 3000/TCP
Host Port: 0/TCP
Limits:
cpu: 100m
memory: 500Mi
Requests:
cpu: 10m
memory: 70Mi
Environment:
MINIO_ACCESS_KEY: <set to the key 'accesskey' in secret 'mlpipeline-minio-artifact'> Optional: false
MINIO_SECRET_KEY: <set to the key 'secretkey' in secret 'mlpipeline-minio-artifact'> Optional: false
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: ml-pipeline-ui-artifact-bb5bc4b57 (1/1 replicas created)
Events: <none>
What else can I check?
If I go to myCluster/ml_metadata.MetadataStoreService/GetEventsByArtifactIDs, I get the message
upstream connect error or disconnect/reset before headers. reset reason: remote reset
Using asm-1143-0
ml-metadata has been upgraded from 1.0.0 to 1.5.0 when Kubeflow is upgraded from 1.3 to 1.5. https://github.com/kubeflow/pipelines/commits/master/third_party/ml-metadata
As a result, MLMD schema version has been changed. So you need to follow the instruction to upgrade MLMD dependency: https://github.com/google/ml-metadata/blob/master/g3doc/get_started.md#upgrade-the-mlmd-library
@zijianjoy Thank you very much for your answer. If I execute
kubectl describe deployment metadata-grpc-deployment -n kubeflow
I get
Name: metadata-grpc-deployment
Namespace: kubeflow
CreationTimestamp: Wed, 23 Jun 2021 21:52:53 -0300
Labels: component=metadata-grpc-server
Annotations: deployment.kubernetes.io/revision: 27
Selector: component=metadata-grpc-server
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: component=metadata-grpc-server
Annotations: kubectl.kubernetes.io/restartedAt: 2022-08-26T16:44:45-03:00
Service Account: metadata-grpc-server
Containers:
container:
Image: gcr.io/tfx-oss-public/ml_metadata_store_server:1.5.0
Port: 8080/TCP
Host Port: 0/TCP
Command:
/bin/metadata_store_server
Args:
--grpc_port=8080
--mysql_config_database=$(MYSQL_DATABASE)
--mysql_config_host=$(MYSQL_HOST)
--mysql_config_port=$(MYSQL_PORT)
--mysql_config_user=$(DBCONFIG_USER)
--mysql_config_password=$(DBCONFIG_PASSWORD)
--enable_database_upgrade=true
Liveness: tcp-socket :grpc-api delay=3s timeout=2s period=5s #success=1 #failure=3
Readiness: tcp-socket :grpc-api delay=3s timeout=2s period=5s #success=1 #failure=3
Environment:
DBCONFIG_USER: <set to the key 'username' in secret 'mysql-secret'> Optional: false
DBCONFIG_PASSWORD: <set to the key 'password' in secret 'mysql-secret'> Optional: false
MYSQL_DATABASE: <set to the key 'mlmdDb' of config map 'pipeline-install-config'> Optional: false
MYSQL_HOST: <set to the key 'dbHost' of config map 'pipeline-install-config'> Optional: false
MYSQL_PORT: <set to the key 'dbPort' of config map 'pipeline-install-config'> Optional: false
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: metadata-grpc-deployment-56779cf65 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 50m deployment-controller Scaled up replica set metadata-grpc-deployment-bb6856f48 to 1
Normal ScalingReplicaSet 48m deployment-controller Scaled down replica set metadata-grpc-deployment-58c7dbcd8b to 0
Normal ScalingReplicaSet 39m deployment-controller Scaled up replica set metadata-grpc-deployment-6cc4b76c8d to 1
Normal ScalingReplicaSet 38m deployment-controller Scaled down replica set metadata-grpc-deployment-bb6856f48 to 0
Normal ScalingReplicaSet 36m deployment-controller Scaled up replica set metadata-grpc-deployment-8c74d44b5 to 1
Normal ScalingReplicaSet 35m deployment-controller Scaled down replica set metadata-grpc-deployment-6cc4b76c8d to 0
Normal ScalingReplicaSet 2m53s deployment-controller Scaled up replica set metadata-grpc-deployment-56779cf65 to 1
Normal ScalingReplicaSet 2m19s deployment-controller Scaled down replica set metadata-grpc-deployment-8c74d44b5 to 0
Does this mean MLMD dependency version is correct? What am I missing?
You need to upgrade the MLMD database schema: https://github.com/google/ml-metadata/blob/master/g3doc/get_started.md#upgrade-the-database-schema
There is a tool for MLMD upgrade: https://github.com/kubeflow/pipelines/blob/74c7773ca40decfd0d4ed40dc93a6af591bbc190/tools/metadatastore-upgrade/README.md
Hi @zijianjoy, Our cluster is a freshly installed 1.5.0 kubeflow cluster.
We also see the below error page when accessing myClusterURL/pipeline/artifacts.
In the beginning, the artifacts page can be loaded successfully, but after we ran about 600 recurring runs, the artifacts page failed to load with the above message.
Even we removed all the content under mlpipeline/artifacts/ path in minio. The artifacts page still failed to load with the error.
Is there any way to recover? Thanks!
@celiawa Currently it is listing all artifacts from MLMD store. Even if you deleted the content in MinIO, the MLMD store doesn't delete the corresponding MLMD object. It is likely a timeout trying to list all the artifacts. There is a plan to improve this page https://github.com/kubeflow/pipelines/issues/3226
Thanks @zijianjoy. I checked the mysql got MLMD store, there're many tables in it. Which tables we shall delete to recover our artifacts page back. We don't want to reinstall.
Hi @zijianjoy @celiawa I am also facing the same issue, unable to see the Artifacts in Kubeflow, let me know solution to fix the same
Upgrading KFP to the latest version should allow you to see paginated artifact list now.
Thanks @zijianjoy, we upgraded to kfp version 2.01 and can see artifact list pagination now.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Closing this issue as it seems the issue is solved.
/close
@rimolive: Closing this issue.
Environment
Using https://www.kubeflow.org/docs/distributions/gke/deploy/upgrade/
Steps to reproduce
Upgrade from Kubeflow 1.3 to Kubeflow 1.5 allows to replicate the problem
Expected result
I expect to be able to see a list of artifacts when I access myClusterURL/pipeline/artifacts. Instead I get this https://user-images.githubusercontent.com/74205824/186285977-cba538c2-e496-416e-8f27-67fa4950b4cc.png
Materials and Reference
Impacted by this bug? Give it a 👍.