rancher / convoy

A Docker volume plugin, managing persistent container volumes.
Apache License 2.0
1.31k stars 135 forks source link

Can't get convoy FS to work #55

Open boedy opened 8 years ago

boedy commented 8 years ago

I wanted to try Convoy GlusterFS, but I can't get it working properly. I retried multiple times following the tutorial in the documentation.

This is what the logs tell me:

12/17/2015 2:45:40 PMtime="2015-12-17T13:45:40Z" level=error msg="Get http:///host/var/run/conoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/conoy-convoy-gluster.sock: no such file or directory"

Overview of my hosts:

screen shot 2015-12-17 at 14 55 32

Overview of my containers:

screen shot 2015-12-17 at 14 50 24

boedy commented 8 years ago

I retried it with ubuntu as host OS this time and it worked. Before I used rancher OS.

deniseschannon commented 8 years ago

I just tried glusterfs and convoy-gluster on my RancherOS hosts and had no issues with it coming up.

What version of RancherOS are you running?

boedy commented 8 years ago

I'm not sure about the version anymore, but it was the latest eu-west-1 version running when I submitted this issue. I was running it on 3 ec2 t2.small instances.

deniseschannon commented 8 years ago

Did you launch the AWS instances through the UI (using docker-machine) or did you launch them independently and then do the custom "Add Host" command?

boedy commented 8 years ago

I launched them via the rancher ui.

ApolloDS commented 8 years ago

Same problem here with RancherOS v0.4.2 and Rancher v0.51.0. I have setup GlusterFS properly and it works. But I see constant "Degraded" of the convoy-gluster and then again "Active". The logfile is filling up with:

30/12/2015 10:45:41time="2015-12-30T09:45:41Z" level=error msg="Get http:///host/var/run/conoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/conoy-convoy-gluster.sock: no such file or directory"

Then I get the exit:

30/12/2015 10:48:34time="2015-12-30T09:48:34Z" level=debug msg="Cleaning up environment..." pkg=daemon
30/12/2015 10:48:34time="2015-12-30T09:48:34Z" level=error msg="Failed to execute: mount [-t glusterfs glusterfs:/my_vol /var/lib/rancher/convoy/convoy-gluster-7e8bd75d-88c9-4b21-b0af-af5a348fbafe/glusterfs/mounts/my_vol], output Mount failed. Please check the log file for more details.\n, error exit status 1"
30/12/2015 10:48:34{
30/12/2015 10:48:34 "Error": "Failed to execute: mount [-t glusterfs glusterfs:/my_vol /var/lib/rancher/convoy/convoy-gluster-7e8bd75d-88c9-4b21-b0af-af5a348fbafe/glusterfs/mounts/my_vol], output Mount failed. Please check the log file for more details.\n, error exit status 1"
30/12/2015 10:48:34}
30/12/2015 10:48:34time="2015-12-30T09:48:34Z" level=info msg="convoy exited with error: exit status 1"
30/12/2015 10:48:34time="2015-12-30T09:48:34Z" level=info msg=Exiting.

Is this a bug?

brutus333 commented 8 years ago

I have to report the same bug. Some details on my env: image image I see the same errors in logs.

yobasystems commented 8 years ago

Same issue here. Running Ubuntu 14.04 LTS, same version as brutus 333.

brutus333 commented 8 years ago

In my case, the issue was related to lack of inter-container communication.

deniseschannon commented 8 years ago

@ApolloDS - Your issue is related to #2903. If you had launched an early version of convoy-gluster, you are unable to launch a new version.

If you deployed a very early version of the template, there may be some manual cleanup required on your hosts. Specifically, with convoy not installed, you'd need to delete the /etc/docker/plugins/convoy-gluster.spec file and then restart docker.

In our latest templates, we have added a fix that will allow to deploy another version.

@boedy Are you still having issues with trying it with RancherOS or is using ubuntu okay with you? If you try again with RancherOS, please make sure that the cross host communication is working by execing into one network agent and pinging the IP of the other network agents.

@yobasystems What logs do you see? Can you also check the cross host communication?

boedy commented 8 years ago

I've found a different solution to my problem, so won't be needing convoy for now. I do however encounter more problems while running rancher OS then when I run rancher on Ubuntu. With rancher OS I had cross hosts connection problems which might be related to this.

bibby commented 8 years ago

This describes my experience yesterday using the 0.58.0-rc1 image. It had previously come up using 0.56.1, but on the latest, convoy-gluster never came up for me, citing the same error messages.

cderwin commented 8 years ago

I'm running into this issue too. I run a three-machine setup, all running rancher os 4.2.3 and docker 1.9.1. GlusterFS runs fine, and convoy was running fine until I had to restart one of the hosts. After the restart the convoy-gluster container on the restarted box fails to start (it appears the socket at /host/var/run/convoy-convoy-gluster.sock does not exist). Here are the startup logs:

Waiting for metadata
3/29/2016 11:36:22 PMtime="2016-03-30T04:36:22Z" level=info msg="Execing [/usr/bin/nsenter --mount=/proc/736/ns/mnt -F -- /var/lib/docker/overlay/3f8e41f2328bdd2155ab244797bc1f69904a2edf14602b590a91d7402ec4ae36/merged/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/convoy-gluster-b0da2c78-8b51-43d0-878d-1f9a15e22c85 -- /launch volume-agent-glusterfs-internal]"
3/29/2016 11:36:22 PMWaiting for metadata
3/29/2016 11:36:22 PMRegistering convoy socket at /var/run/convoy-convoy-gluster.sock
3/29/2016 11:36:22 PMtime="2016-03-30T04:36:22Z" level=info msg="Listening for health checks on 0.0.0.0:10241/healthcheck"
3/29/2016 11:36:22 PMtime="2016-03-30T04:36:22Z" level=info msg="Got: root /var/lib/rancher/convoy/convoy-gluster-b0da2c78-8b51-43d0-878d-1f9a15e22c85"
3/29/2016 11:36:22 PMtime="2016-03-30T04:36:22Z" level=info msg="Got: drivers [glusterfs]"
3/29/2016 11:36:22 PMtime="2016-03-30T04:36:22Z" level=info msg="Got: driver-opts [glusterfs.defaultvolumepool=rancher_vol glusterfs.servers=glusterfs]"
3/29/2016 11:36:22 PMtime="2016-03-30T04:36:22Z" level=info msg="Launching convoy with args: [--socket=/host/var/run/convoy-convoy-gluster.sock daemon --root=/var/lib/rancher/convoy/convoy-gluster-b0da2c78-8b51-43d0-878d-1f9a15e22c85 --drivers=glusterfs --driver-opts=glusterfs.defaultvolumepool=rancher_vol --driver-opts=glusterfs.servers=glusterfs]"
3/29/2016 11:36:22 PMtime="2016-03-30T04:36:22Z" level=debug msg="Found existing config. Ignoring command line opts, loading config from /var/lib/rancher/convoy/convoy-gluster-b0da2c78-8b51-43d0-878d-1f9a15e22c85" pkg=daemon
3/29/2016 11:36:22 PMtime="2016-03-30T04:36:22Z" level=debug msg= driver=glusterfs driver_opts=map[glusterfs.defaultvolumepool:rancher_vol glusterfs.servers:glusterfs] event=init pkg=daemon reason=prepare root="/var/lib/rancher/convoy/convoy-gluster-b0da2c78-8b51-43d0-878d-1f9a15e22c85"
3/29/2016 11:36:22 PMtime="2016-03-30T04:36:22Z" level=debug msg="Umount existing mountpoint /var/lib/rancher/convoy/convoy-gluster-b0da2c78-8b51-43d0-878d-1f9a15e22c85/glusterfs/mounts/rancher_vol" pkg=util
3/29/2016 11:36:22 PMtime="2016-03-30T04:36:22Z" level=debug msg="Volume rancher_vol is being mounted it to /var/lib/rancher/convoy/convoy-gluster-b0da2c78-8b51-43d0-878d-1f9a15e22c85/glusterfs/mounts/rancher_vol, with option [-t glusterfs]" pkg=util
3/29/2016 11:36:23 PMtime="2016-03-30T04:36:23Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: no such file or directory"
3/29/2016 11:36:24 PMtime="2016-03-30T04:36:24Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: no such file or directory"
3/29/2016 11:36:25 PMtime="2016-03-30T04:36:25Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: no such file or directory"
3/29/2016 11:36:26 PMtime="2016-03-30T04:36:26Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: no such file or directory"
3/29/2016 11:36:27 PMtime="2016-03-30T04:36:27Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: no such file or directory"
3/29/2016 11:36:28 PMtime="2016-03-30T04:36:28Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: no such file or directory"
3/29/2016 11:36:29 PMtime="2016-03-30T04:36:29Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: no such file or directory"
3/29/2016 11:36:30 PMtime="2016-03-30T04:36:30Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: no such file or directory"
3/29/2016 11:36:31 PMtime="2016-03-30T04:36:31Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: no such file or directory"
3/29/2016 11:36:32 PMtime="2016-03-30T04:36:32Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: no such file or directory"

Is this a networking issue? I've seen a few bug reports about restarting rancher instances floating around, but at the very least these logs are hella unclear that that's the case.

nathan-osman commented 8 years ago

I'm seeing this while running Rancher on three Ubuntu machines.

02/04/2016 15:18:58time="2016-04-02T22:18:58Z" level=info msg="Launching convoy with args: [--socket=/host/var/run/convoy-convoy-gluster.sock daemon --root=/var/lib/rancher/convoy/convoy-gluster-c6945625-6847-4499-885d-0b74755adca2 --drivers=glusterfs --driver-opts=glusterfs.defaultvolumepool=volume --driver-opts=glusterfs.servers=glusterfs]"
02/04/2016 15:18:58time="2016-04-02T22:18:58Z" level=debug msg="Creating config at /var/lib/rancher/convoy/convoy-gluster-c6945625-6847-4499-885d-0b74755adca2" pkg=daemon
02/04/2016 15:18:58time="2016-04-02T22:18:58Z" level=debug msg= driver=glusterfs driver_opts=map[glusterfs.defaultvolumepool:volume glusterfs.servers:glusterfs] event=init pkg=daemon reason=prepare root="/var/lib/rancher/convoy/convoy-gluster-c6945625-6847-4499-885d-0b74755adca2"
02/04/2016 15:18:58time="2016-04-02T22:18:58Z" level=debug msg="Volume volume is being mounted it to /var/lib/rancher/convoy/convoy-gluster-c6945625-6847-4499-885d-0b74755adca2/glusterfs/mounts/volume, with option [-t glusterfs]" pkg=util
02/04/2016 15:18:59time="2016-04-02T22:18:59Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: no such file or directory"

Versions:

Rancher: v1.0.0 Cattle: v0.159.2 Rancher UI: v0.100.3 Rancher Compose: v0.7.3

Host machines: Ubuntu 14.04.4 LTS

kaotika commented 8 years ago

The problem still exists (three Ubuntu 14.04 Hosts, Rancher Release - v1.1.0-dev1). Any updates or workarounds? Docker v1.10.3 from official docker repo. Cattle environment.

Typositoire commented 8 years ago

I'm also having this issue, anyone was able to fix this ?

Docker v1.10.3 Rancher v1.1.0-dev5 Ubuntu 14.04 Hosts

brutus333 commented 8 years ago

Forget about convoy-gluster unless you want to support it yourself: https://forums.rancher.com/t/why-the-gluster-and-convoy-gluster-catalog-items-gone/3618/2