kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.59k stars 1.62k forks source link

Failed to store run - Error 1062: Duplicate entry. ml-pipelines service becomes unresonsive #1826

Closed strangemonad-faire closed 4 years ago

strangemonad-faire commented 5 years ago

What happened:

pipelines version 0.1.25, gcp kubeflow namespaced install

Pipeline run_details fails to update run details with the following error:

Error while creating or updating run for workflow: 'kubeflow/relationship-graph-updategsdjb-2062-1367216059'. Create error: 'InternalServerError: Failed to store run RUN_NAME to table: Error 1062: Duplicate entry '45474da3-af6e-11e9-86a0-42010a8000e5' for key 'PRIMARY''. Update error: 'Invalid input error: Failed to update run 45474da3-af6e-11e9-86a0-42010a8000e5. Row not found.'

See full trace below. When several runs get in this state, it bogs down the entire mysql pod and ml-pipelines pod

What did you expect to happen:

Updating run data should gracefully handle cases where the data is already reported.

What steps did you take:

I manually deleted the run_details where uuid matched the offending entities in the logs so mysql could become responsive again.

Anything else you would like to add:

/api.ReportService/ReportWorkflow call failed
github.com/kubeflow/pipelines/backend/src/common/util.(*UserError).wrapf
    backend/src/common/util/error.go:206
github.com/kubeflow/pipelines/backend/src/common/util.Wrapf
    backend/src/common/util/error.go:231
main.apiServerInterceptor
    backend/src/apiserver/interceptor.go:32
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler
    bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:176
google.golang.org/grpc.(*Server).processUnaryRPC
    external/org_golang_google_grpc/server.go:966
google.golang.org/grpc.(*Server).handleStream
    external/org_golang_google_grpc/server.go:1245
google.golang.org/grpc.(*Server).serveStreams.func1.1
    external/org_golang_google_grpc/server.go:685
runtime.goexit
    GOROOT/src/runtime/asm_amd64.s:1333
I0813 18:13:26.186834       6 error.go:218] Invalid input error: Failed to update run 45474da3-af6e-11e9-86a0-42010a8000e5. Row not found.
github.com/kubeflow/pipelines/backend/src/common/util.NewInvalidInputError
    backend/src/common/util/error.go:165
github.com/kubeflow/pipelines/backend/src/apiserver/storage.(*RunStore).UpdateRun
    backend/src/apiserver/storage/run_store.go:415
github.com/kubeflow/pipelines/backend/src/apiserver/storage.(*RunStore).CreateOrUpdateRun
    backend/src/apiserver/storage/run_store.go:429
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).ReportWorkflowResource
    backend/src/apiserver/resource/resource_manager.go:469
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*ReportServer).ReportWorkflow
    backend/src/apiserver/server/report_server.go:39
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler.func1
    bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:174
main.apiServerInterceptor
    backend/src/apiserver/interceptor.go:30
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler
    bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:176
google.golang.org/grpc.(*Server).processUnaryRPC
    external/org_golang_google_grpc/server.go:966
google.golang.org/grpc.(*Server).handleStream
    external/org_golang_google_grpc/server.go:1245
google.golang.org/grpc.(*Server).serveStreams.func1.1
    external/org_golang_google_grpc/server.go:685
runtime.goexit
    GOROOT/src/runtime/asm_amd64.s:1333
Error while creating or updating run for workflow: 'kubeflow/RUN_NAME'. Create error: 'InternalServerError: Failed to store run RUN_NAME to table: Error 1062: Duplicate entry '45474da3-af6e-11e9-86a0-42010a8000e5' for key 'PRIMARY''. Update error: 'Invalid input error: Failed to update run 45474da3-af6e-11e9-86a0-42010a8000e5. Row not found.'
github.com/kubeflow/pipelines/backend/src/common/util.(*UserError).wrap
    backend/src/common/util/error.go:211
github.com/kubeflow/pipelines/backend/src/common/util.Wrap
    backend/src/common/util/error.go:244
github.com/kubeflow/pipelines/backend/src/apiserver/storage.(*RunStore).CreateOrUpdateRun
    backend/src/apiserver/storage/run_store.go:431
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).ReportWorkflowResource
    backend/src/apiserver/resource/resource_manager.go:469
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*ReportServer).ReportWorkflow
    backend/src/apiserver/server/report_server.go:39
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler.func1
    bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:174
main.apiServerInterceptor
    backend/src/apiserver/interceptor.go:30
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler
    bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:176
google.golang.org/grpc.(*Server).processUnaryRPC
    external/org_golang_google_grpc/server.go:966
google.golang.org/grpc.(*Server).handleStream
    external/org_golang_google_grpc/server.go:1245
google.golang.org/grpc.(*Server).serveStreams.func1.1
    external/org_golang_google_grpc/server.go:685
runtime.goexit
    GOROOT/src/runtime/asm_amd64.s:1333
Report workflow failed.
github.com/kubeflow/pipelines/backend/src/common/util.(*UserError).wrap
    backend/src/common/util/error.go:211
github.com/kubeflow/pipelines/backend/src/common/util.Wrap
    backend/src/common/util/error.go:244
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*ReportServer).ReportWorkflow
    backend/src/apiserver/server/report_server.go:41
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler.func1
    bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:174
main.apiServerInterceptor
    backend/src/apiserver/interceptor.go:30
github.com/kubeflow/pipelines/backend/api/go_client._ReportService_ReportWorkflow_Handler
    bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/report.pb.go:176
google.golang.org/grpc.(*Server).processUnaryRPC
    external/org_golang_google_grpc/server.go:966
google.golang.org/grpc.(*Server).handleStream
    external/org_golang_google_grpc/server.go:1245
google.golang.org/grpc.(*Server).serveStreams.func1.1
    external/org_golang_google_grpc/server.go:685
runtime.goexit
    GOROOT/src/runtime/asm_amd64.s:1333
rmgogogo commented 4 years ago

Hi Shawn, any more info for reproduce the problem?

rmgogogo commented 4 years ago

Since the Key is UUID, I think it's not easy to be reproduced. Is the issue still can be reproduced from your side?

rmgogogo commented 4 years ago

hi Shawn, any info? If no, here I close for now. Feel free to reopen.