kubeflow / metadata

Repository for assets related to Metadata.
Apache License 2.0
121 stars 69 forks source link

metadata-deployment pods cannot connect to MySQL DB #123

Closed rummens closed 5 years ago

rummens commented 5 years ago

/kind bug

What steps did you take and what happened: Deployed Kubeflow 0.6 and the followings pods crash loop: metadata-deployment-6cf77db994-dtbhd, metadata-deployment-6cf77db994-kzv52, metadata-deployment-6cf77db994-mr2gh

The error message is the following:

F0903 16:10:16.539092       1 main.go:90] Failed to create ML Metadata Store with config mysql:<host:"metadata-db.kubeflow" port:3306 database:"metadb" user:"root" password:"test" > : mysql_real_connect failed: errno: 2005, error: Unknown MySQL server host 'metadata-db.kubeflow' (-2).
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc00018b100, 0xc00020e000, 0x11e, 0x174)
        external/com_github_golang_glog/glog.go:769 +0xb1
github.com/golang/glog.(loggingT).output(0x1633360, 0xc000000003, 0xc0002032d0, 0x14eadd3, 0x7, 0x5a, 0x0)
        external/com_github_golang_glog/glog.go:720 +0x2f6
github.com/golang/glog.(loggingT).printf(0x1633360, 0x3, 0xf6fee1, 0x37, 0xc0001efe30, 0x2, 0x2)
        external/com_github_golang_glog/glog.go:655 +0x14e
github.com/golang/glog.Fatalf(...)
        external/com_github_golang_glog/glog.go:1148
main.mlmdStoreOrDie(0x0)
        server/main.go:90 +0x1c3
main.main()
        server/main.go:101 +0xe0

What did you expect to happen: Deploy without crashing.

Anything else you would like to add: We had the idea that it might be the internal DNS that failed? Any idea how to narrow down the problem?

Environment:

rummens commented 5 years ago

Turns out we had an mistake in our automation script. The metadb was looking for mysql in the wrong namespace.

sahilprasad commented 5 years ago

@rummens I'm also seeing this. Any tips on figuring out where the problem is?

rummens commented 5 years ago

Sure, for us it was a minor and stupid mistake. We forgot to change the namespace of the metdata db component. So when we ran our tests we changed the namespace to something else then the default and this caused the issue. If you closely look at the error it even says so:

Failed to create ML Metadata Store with config mysql:<host:"metadata-db.kubeflow"

Our correct configuration would be something like metadata-db.kubeflowCustomNamespace

If you want I can ask the developer who actually did it to comment?

sahilprasad commented 5 years ago

All good @rummens ! Thanks for your help.

Yeah, turns out it was something dumb on our end too. We were using kustomize to add the app: our-app label to all of the specs. Removed that and now everything works as expected.

rummens commented 5 years ago

Glad to help ;-)

whycircle commented 5 years ago

@sahilprasad Could you tell me the exact steps to fix this problem?