mesosphere / kubernetes-mesos

A Kubernetes Framework for Apache Mesos
637 stars 92 forks source link

mesos/docker cluster does not support vmware fusion (statfs bug) #630

Open cmluciano opened 8 years ago

cmluciano commented 8 years ago

I cannot run the example getting started guide. It seems that there are some errors with the default docker containers.

Error response from daemon: Cannot start container 34430d160db814fba6d3c226eb0efd4e58a976495e1c5b7ffc3ed3c2461fa800: Cannot link to a non running container: /docker_apiserver_1 AS /fervent_yalow/apiserver

docker_apiserver_1.log

/bin/bash: -c: line 0: syntax error near unexpected token `('
karlkfi commented 8 years ago

I assume you're using the mesos/docker guide. A few context questions:

  1. Which version (or git sha) of the source code do you have checked out?
  2. Which versions of docker (client/engine/machine) are you using?
cmluciano commented 8 years ago

Yep https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos-docker.md.

  1. Master
  2. 1.7.1, docker-machine version 0.3.0 (HEAD)

I'm guessing the problem might be that I am using master

karlkfi commented 8 years ago

Master usually works. We run CI against it. However, our CI is using docker (client/engine) 1.8.2 and docker-compose 1.5.0 on a linux machine.

Your error looks familiar, but my setup isn't similar enough for me to repro atm. If I had to guess I would blame the docker-compose version. There were some recent compose changes that broke environment variable reverse compatibility. We upgraded master for compose 1.5.0.

I have not tested master with docker 1.7.1 recently, nor ever with docker-machine 0.3.0.

I do know, however, that the mesos/docker cluster does not currently work on docker 1.9.0 with docker-machine 0.5.1, and I'm currently working on trying to remedy that.

Unfortunately, installing a specific old version of docker on Mac is seemingly a bit complicated. It might be possible with an old docker toolbox, however. You might start with just updating docker-compose tho. Maybe it doesn't require also updating docker.

cmluciano commented 8 years ago

Ok so this gets a little further

╙ ∓ docker version
Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 786b29d
OS/Arch (client): darwin/amd64
Server version: 1.7.0
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 0baf609
OS/Arch (server): linux/amd64
docker-machine --version
docker-machine version 0.5.0 (HEAD)
docker-compose version: 1.5.1
docker-py version: 1.5.0
CPython version: 2.7.10
OpenSSL version: OpenSSL 0.9.8zg 14 July 2015

Error log https://gist.github.com/cmluciano/57f286e0427fecc6a2bf.

Not sure if I should note that I am on OS X

karlkfi commented 8 years ago

The Recreating lines there indicate that you did two kube-ups back to back without a kube-down first. I haven't actually confirmed that that works. Try a kube-down first to clean up the env, then kube-up. Do you have the same problem?

cmluciano commented 8 years ago

Yes. Here is a snippet.

e is 65537 (0x10001)
Creating certificate sign request
Signing new certificate with private key
Signature ok
subject=/C=GB/ST=London/L=London/O=example/OU=IT/CN=example.com
Getting Private key
Key: /var/run/kubernetes/auth/root-ca.key
Cert: /var/run/kubernetes/auth/root-ca.crt
Creating Service-Account RSA Key
Key: /Users/cmlucian/tmp/kubernetes/auth/service-accounts.key
Creating User Accounts
Token Users: /Users/cmlucian/tmp/kubernetes/auth/token-users
Basic-Auth Users: /Users/cmlucian/tmp/kubernetes/auth/basic-users
Starting mesos/docker cluster
Creating docker_ambassador_1
Creating docker_etcd_1
Creating docker_mesosmaster1_1
Creating docker_apiserver_1
Creating docker_keygen_1
Creating docker_controller_1
Creating docker_mesosslave_1
Creating docker_scheduler_1
Scaling mesos/docker cluster to 2 slaves
Creating and starting 2 ... done
Waiting (up to 180s) for http://apiserver:8888/healthz to be healthy
Waiting (up to 180s) for http://apiserver:8888/healthz to be healthy
Health check of http://apiserver:8888/healthz succeeded!
KUBE_MASTER_IP: 172.17.0.7:6443
KUBE_MINION_IP_ADDRESSES: [172.17.0.12 172.17.0.10]
cluster "mesos/docker" set.
context "mesos/docker" set.
user "cluster-admin" set.
switched to context "mesos/docker".
Wrote config for mesos/docker to /Users/cmlucian/.kube/config
Deploying Addons
Unable to connect to the server: dial tcp 172.17.0.7:6443: i/o timeout
Dumping logs to '/Users/cmlucian/tmp/kubernetes/log'
cmluciano commented 8 years ago

Everything else seems to be coming up

screen shot 2015-11-19 at 11 41 02 am
cmluciano commented 8 years ago

Network info if that's helpful

screen shot 2015-11-19 at 11 42 01 am
cmluciano commented 8 years ago

I think the problem might be that I skipped the step to route the traffic through sudo route -n add -net 172.17.0.0 $(boot2docker ip) trying that now

karlkfi commented 8 years ago

Yeah, that's required for your host to talk directly to the docker ips. You'll want docker-machine ip <id> instead of boot2docker ip.

cmluciano commented 8 years ago

OK we're definitely making progress (hope to send a PR later to update the docs for VMware Fusion). I'm now hitting a problem that it looks like your reported a while ago https://github.com/kubernetes/kubernetes/issues/10697.

http://192.168.200.131:8888/ui http://192.168.200.131:8888/api/v1/proxy/namespaces/kube-system/services/kube-ui/#/dashboard/

screen shot 2015-11-19 at 12 23 12 pm
jdef commented 8 years ago

Might want to check for a flapping pod. An unhealthy pod will result in the above error.

On Thu, Nov 19, 2015 at 12:24 PM, cmluciano notifications@github.com wrote:

OK we're definitely making progress (hope to send a PR later to update the docs for VMware Fusion). I'm now hitting a problem that it looks like your reported a while ago kubernetes/kubernetes#10697 https://github.com/kubernetes/kubernetes/issues/10697.

http://192.168.200.131:8888/ui

http://192.168.200.131:8888/api/v1/proxy/namespaces/kube-system/services/kube-ui/#/dashboard/ [image: screen shot 2015-11-19 at 12 23 12 pm] https://cloud.githubusercontent.com/assets/4790487/11278314/5b55f5ee-8eb8-11e5-9323-41f0a60507bd.png

— Reply to this email directly or view it on GitHub https://github.com/mesosphere/kubernetes-mesos/issues/630#issuecomment-158126669 .

cmluciano commented 8 years ago

Ah yes! The benefit to this being on Mesos is I can check those logs

https://gist.github.com/cmluciano/7b9e9024066272ff1b6d

screen shot 2015-11-19 at 12 43 26 pm

karlkfi commented 8 years ago

Looks like the Mesos fetcher is failing to install nsenter and socat. @sttts do you know why that would be?

cmluciano commented 8 years ago

It is available, and seems to have the right permissions

screen shot 2015-11-19 at 12 48 58 pm

karlkfi commented 8 years ago

xref: https://github.com/kubernetes/kubernetes/pull/17514

karlkfi commented 8 years ago

Various online searches Cannot change ownership to uid 501, gid 50: Operation not permitted lead me to wonder if the vmware fusion driver is using the same bootdocker iso that virtualbox is using. One thing I had to do recently was to re-create my vbox vm, in order to correctly pull the latest boot2docker iso.

It's also possible that the linux user ids are handled differently on vmware vs vbox.

ref: https://www.krenger.ch/blog/linux-tar-cannot-change-ownership-to-permission-denied/

karlkfi commented 8 years ago

Ah... looks like not working on vmware is a known issue that I didn't know about: https://github.com/kubernetes/kubernetes/pull/15849

@sttts tells me: "If you try to use Fusion with our docker-cluster you need either https://github.com/kubernetes/kubernetes/pull/15849 or put /var/lib/mesos onto a volume which is not Fusion mounted".

karlkfi commented 8 years ago

Renamed. I'm going to leave this in the "soon" backlog, but I can't spend any more time on it right now.

We do need to bring the issue to the attention of VMware's support, to see if there's a better workaround. I'll leave that to @sttts who has a fusion license locally and has run into these issues before.

If you have time to chase this down more, @cmluciano, please let us know what you find. Otherwise I'd recommend using vbox for the time being.

cmluciano commented 8 years ago

Sounds good. Thanks a lot for helping out!