Closed andrewm4894 closed 4 years ago
I'm having the exact same issue. Reproduced by executing the metadata demo. I was able to "fix" it by just restarting metadata-grpc-deployment
. So, it seems there's no relation to the database itself. Looks like some session is getting stale.
@rafaelbarreto87 if you get a moment can you share the kubectl commands you used to do this? New to kubernetes so not exactly sure yet how to do stuff like this.
Ps here is a response I got on the kubeflow slack, seems like it's actually an expected error. Not exactly sure on the details yet, need to read up a bit on the links below.
https://github.com/kubeflow/pipelines/issues/2329#issuecomment-549590635 Support for metadata db wasn't included in Kubeflow 0.7, the errors you see is expected.
The cause is that the connection betweenmetadata-grpc-deployment
and underlying mysql is reset after certain amount of time.
The temporary fix is as @rafaelbarreto87 suggested to restart grpc-deployment to reconnect.
The long term fix is to add probe endpoint to test the liveness of grpc-deployment so that it is restarted automatically.
@zhenghuiwang would you be able to point me towards and instructions on how to restart the grpc-deployment?
Apologies as i am new to kubernetes.
@andrewm4894 if you are on kubectl 1.15+ i believe you can use restart command
kubectl rollout restart deployment metadata-grpc-deployment -n kubeflow
we are on 1.14 and have just deleted the pod and the deploy recreates the pod which also seems to fix the issue
kubectl delete po -l=component=grpc-server -n kubeflow
Why do I need to restart the metadata-grpc-deployment service to work? If there is no other way to set the liveness probe?
Ran into this as well. It looks like it successfully connected but the readiness probe fails. Since it's not the health check nothing is retrying this operation either. The easiest fix would be using the same readiness probe for the health check but I'd like to understand why it gets into this 'gone away' state in the first place..
Thanks @discordianfish for adding liveness probe for HTTP server.
For gRPC deployment, the issue is fixed upstream in MLMD v0.21 (commit), which is included in the Kubeflow v1.0RCs.
So this is an issue for KF v0.7 but not for v1.0RCs
Ah great, thanks for explaining.
Just FYI, MLMD has been updated in master, not in v1.0-branch. So, if you're using kdefs
from v1.0-branch, this is still a problem for now.
@rafaelbarreto87 Good catch! The above cherry pick should fix it for KF v1.0
Reopening because of kubeflow/kubeflow#4797
/kind bug
What steps did you take and what happened: I created a kubeflow via https://deploy.kubeflow.cloud/. I was trying to run through this demo but ran into what looks like a mysql error when i tried to create a workspace (cell 4 of the demo notebook).
What did you expect to happen:
A workspace would be created that i could store artifacts into.
Anything else you would like to add:
I also see what looks like the same error on the UI:
Environment:
kubectl version
): 1.14.9-gke.2