Closed lomori closed 4 years ago
/area engprod /priority p2
I'm working through this myself at the moment! I'll let you know if I figure anything out.
Feel free to reopen it if it happens on Kubeflow v1.0
Getting same with version 0.7. Haven't tried 1.0 yet, but the source for KF Pipelines 0.3.0 shows the same internal Google URL (below) for proxy-agent, so suspecting it's still broken...?
metadata-grpc logs:
2020-03-31 02:16:44.713587: F ml_metadata/metadata_store/metadata_store_server_main.cc:219] Non-OK-status: status status: Internal: mysql_real_connect failed: errno: 2002, error: Can't connect to MySQL server on 'metadata-db' (115)MetadataStore cannot be created with the given connection config.
proxy-agent logs:
++ curl http://metadata.google.internal/computeMetadata/v1/instance/zone -H 'Metadata-Flavor: Google' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: metadata.google.internal
I'm using the latest proxy-agent image gcr.io/ml-pipeline/inverse-proxy-agent:1.0.0-rc.5 in my standalone deployment on microk8s and this is the only pod that has a CrashLoopBackOff.
Is the pod needed when I just want to use Kubeflow Piplines on a kubernetes cluster? thanks in advance
logs from the container:
+++ dirname /opt/proxy/attempt-register-vm-on-proxy.sh ++ cd /opt/proxy ++ pwd
[[ ! -z '' ]] ++ curl http://metadata.google.internal/computeMetadata/v1/instance/zone -H 'Metadata-Flavor: Google' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: metadata.google.internal
Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details.
the problem is not solved in kf 1.2.0 installed by kfctl_k8s_istio.v1.2.0.yaml
问题在 kfctl_k8s_istio.v1.2.0.yaml 安装的 kf 1.2.0 中没有解决 below info is metadata-grpc-deployment pod logs F ml_metadata/metadata_store/metadata_store_server_main.cc:219] Non-OK-status: status status: Internal: mysql_real_connect failed: errno: 1130, erro │ stream closed
/kind bug
What steps did you take and what happened: Fresh install of version 0.7 on an EKS cluster following instructions for how to deploy on EKS.
We are seeing errors like:
E1121 18:56:38.635354 1 main.go:98] Failed to create ML Metadata Store: mysql_real_connect failed: errno: 2005, error: Unknown MySQL server host 'metadata-db.kubeflow' (-3).
It is interesting that if I go directly to the pod:
root@metadata-deployment-65466fd7cb-97w9g:/go/src/github.com/kubeflow/metadata# ping metadata-db.kubeflow ping: metadata-db.kubeflow: Temporary failure in name resolution root@metadata-deployment-65466fd7cb-97w9g:/go/src/github.com/kubeflow/metadata# ping www.google.com
ping: www.google.com: Temporary failure in name resolution
It looks like hostname resolution is not working on those pods.
I went to another pod, unrelated to metadata, and name resolution, including the ones that failed above, worked just fine.
What did you expect to happen: Metadata components up and running.
Anything else you would like to add: All other components were deployed properly.
Environment:
kubectl version
): 1.13/etc/os-release
): debian:stretch