kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.62k stars 1.63k forks source link

Error on Grouped Tab in Kubeflow Pipelines UI - Failed to Get Executions #11319

Open afrozsh19 opened 1 month ago

afrozsh19 commented 1 month ago

Environment

Steps to reproduce

Description: When accessing the Execution tab in Kubeflow Pipelines, the default Main tab loads fine. However, when switching to the Grouped tab, the UI takes a while to load and then results in the following error:

Steps

  1. Navigate to the Executions tab in the Kubeflow Pipelines UI.
  2. Switch from the default tab to the Grouped tab.
  3. The page attempts to load and eventually fails with the above error.

Expected result

Actual Result

The page results in failure with below error message Error: Failed getting executions: Unknown Content-type received. Code: 2

image

Materials and reference

Debugging Findings:

  1. Network Call Failure:

    • One of the network calls in the browser fails:
      • Resource Path: /ml_metadata.MetadataStoreService/GetExecutions
      • Response: Gateway Time-out
  2. Pod Logs (metadata-grpc-deployment):

    • Logs from the metadata-grpc-deployment pod show the following error: W1021 10:05:33.342247 210 metadata_store_service_impl.cc:417] PutExecution failed: mysql_query aborted: errno: Lock wait timeout exceeded; try restarting transaction, error: Lock wait timeout exceeded; try restarting transaction
  3. Executions Fetched Across All Profiles:

    • The system appears to fetch executions from all Kubeflow profiles (i.e., namespaces) regardless of the currently selected profile in the UI. This results in fetching executions across multiple namespaces, which might be contributing to the slowness.

Additional Context:

Impacted by this bug? Give it a 👍.

saijalgupta2 commented 2 weeks ago

could you please help us when this issue will be resolved? Thank you