kubeflow / metadata

Repository for assets related to Metadata.
Apache License 2.0
121 stars 69 forks source link

metadata deployment and grpc pods cannot connect to MySQL DB #174

Closed lomori closed 4 years ago

lomori commented 4 years ago

/kind bug

What steps did you take and what happened: Fresh install of version 0.7 on an EKS cluster following instructions for how to deploy on EKS.

We are seeing errors like:

E1121 18:56:38.635354 1 main.go:98] Failed to create ML Metadata Store: mysql_real_connect failed: errno: 2005, error: Unknown MySQL server host 'metadata-db.kubeflow' (-3).

It is interesting that if I go directly to the pod:

root@metadata-deployment-65466fd7cb-97w9g:/go/src/github.com/kubeflow/metadata# ping metadata-db.kubeflow ping: metadata-db.kubeflow: Temporary failure in name resolution root@metadata-deployment-65466fd7cb-97w9g:/go/src/github.com/kubeflow/metadata# ping www.google.com
ping: www.google.com: Temporary failure in name resolution

It looks like hostname resolution is not working on those pods.

I went to another pod, unrelated to metadata, and name resolution, including the ones that failed above, worked just fine.

What did you expect to happen: Metadata components up and running.

Anything else you would like to add: All other components were deployed properly.

Environment:

jtfogarty commented 4 years ago

/area engprod /priority p2

WillBeebe commented 4 years ago

I'm working through this myself at the moment! I'll let you know if I figure anything out.

zhenghuiwang commented 4 years ago

Feel free to reopen it if it happens on Kubeflow v1.0

rapuckett commented 4 years ago

Getting same with version 0.7. Haven't tried 1.0 yet, but the source for KF Pipelines 0.3.0 shows the same internal Google URL (below) for proxy-agent, so suspecting it's still broken...?

metadata-grpc logs:

2020-03-31 02:16:44.713587: F ml_metadata/metadata_store/metadata_store_server_main.cc:219] Non-OK-status: status status: Internal: mysql_real_connect failed: errno: 2002, error: Can't connect to MySQL server on 'metadata-db' (115)MetadataStore cannot be created with the given connection config.

proxy-agent logs:

++ curl http://metadata.google.internal/computeMetadata/v1/instance/zone -H 'Metadata-Flavor: Google' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: metadata.google.internal

rsassPwC commented 4 years ago

I'm using the latest proxy-agent image gcr.io/ml-pipeline/inverse-proxy-agent:1.0.0-rc.5 in my standalone deployment on microk8s and this is the only pod that has a CrashLoopBackOff.

Is the pod needed when I just want to use Kubeflow Piplines on a kubernetes cluster? thanks in advance

logs from the container:

+++ dirname /opt/proxy/attempt-register-vm-on-proxy.sh ++ cd /opt/proxy ++ pwd

issue-label-bot[bot] commented 4 years ago

Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details.

ssdst commented 3 years ago

the problem is not solved in kf 1.2.0 installed by kfctl_k8s_istio.v1.2.0.yaml

ssdst commented 3 years ago

问题在 kfctl_k8s_istio.v1.2.0.yaml 安装的 kf 1.2.0 中没有解决 below info is metadata-grpc-deployment pod logs F ml_metadata/metadata_store/metadata_store_server_main.cc:219] Non-OK-status: status status: Internal: mysql_real_connect failed: errno: 1130, erro │ stream closed