mantl / mesos-consul

Mesos to Consul bridge for service discovery
Apache License 2.0
338 stars 95 forks source link

Question about consul service registration with regards to ports and docker #44

Closed ghost closed 8 years ago

ghost commented 8 years ago

i currently have an app that is running in a docker container using mesos scheduled with marathon, along with the mesos-consul bridge.

Current marathon app configuration is using bridge networking and allowing mesos/marathon to select whatever port that is available for the host port, but the docker container itself is bound to 8080:

{ "container": { "type": "DOCKER", "docker": { "image": "sarlindo/wildfly-app", "network": "BRIDGE", "portMappings": [ { "containerPort": 8080, "hostPort": 0, "servicePort": 0, "protocol": "tcp" } ] } }, "id": "wildfly", "cmd": "/opt/jboss/wildfly/bin/standalone.sh -b 0.0.0.0 -bmanagement=0.0.0.0", "instances": 1, "cpus": 0.3, "mem": 256 }

Now, when this service gets registered with consul by the mesos-consul bridge, I see it being registered to the following ip/port.

172.17.0.4:31657

Now the ip here is the internal docker ip and not the host and the port number is the host port that mesos/marathon assigned.

The issue now is I can't get to this service because inside the docker container the port is actually 8080.

Is this the way this is suppose to work? Or am I doing something wrong here?

ChrisAubuchon commented 8 years ago

Are you using the default mesos-ip-order? Is docker included in the list? I haven't seen a use for having docker in the search order since it returns the docker IP address which isn't particularly useful as far as I can tell. If it is in the search list, try removing it.

The port is probably correct. Mesos will assign a random port to the docker container and map from 31657->8080.

ghost commented 8 years ago

Yes the port is correct, it's just the IP chosen and registered with consul was the docker IP address. I am running the mesos-consul with defaults. The following is the marathon json i am using to run the mesos-consul bridge.

{ "container": { "type": "DOCKER", "docker": { "image": "ciscocloud/mesos-consul", "network": "BRIDGE", "parameters": [ { "key": "rm", "value": "true" } ] } }, "id": "mesos-consul", "args": ["--zk=zk://192.168.33.10:2181/mesos"], "instances": 1, "cpus": 0.1, "mem": 256, "constraints": [["hostname", "CLUSTER", "node1"]] }

ChrisAubuchon commented 8 years ago

Hmm...I can't reproduce...Can you post the task section from the Mesos master? /master/state.json from the Mesos leader

ghost commented 8 years ago

Here you go below, I think I may be bumping into this issue https://github.com/mesosphere/mesos-dns/issues/334 (I know it says mesos-dns, but if you follow the thread, I believe someone is pointing to mesos as the potential issue, but I will have to dig some more) :

            {
                "executor_id": "",
                "framework_id": "13742ebd-7985-4898-b01e-6587d19b885d-0001",
                "id": "wildfly.88156cb6-925c-11e5-b212-02429beb943f",
                "name": "wildfly",
                "resources": {
                    "cpus": 0.3,
                    "disk": 0,
                    "mem": 256.0,
                    "ports": "[31268-31268]"
                },
                "slave_id": "a8f46f83-034d-459b-ac0e-e2effd094e4f-S1",
                "state": "TASK_RUNNING",
                "statuses": [
                    {
                        "container_status": {
                            "network_infos": [
                                {
                                    "ip_address": "172.17.0.3"
                                }
                            ]
                        },
                        "labels": [
                            {
                                "key": "Docker.NetworkSettings.IPAddress",
                                "value": "172.17.0.3"
                            }
                        ],
                        "state": "TASK_RUNNING",
                        "timestamp": 1448336183.15899
                    }
                ]
            },
ChrisAubuchon commented 8 years ago

That is exactly what you're running into. Ugh. The default search order is netinfo,mesos,host so it's using the ip address in the network_infos block. A workaround is to add "--mesos-ip-order=mesos,host" to your marathon job for mesos-consul.

ghost commented 8 years ago

@ChrisAubuchon I have actually been trying this, but now it seems mesos-consul bridge won't even register any new services with consul, I created a new service in marathon and when I go to the consul ui it doesn't register anything now?

This is now my new marathon json for mesos-consul

{ "container": { "type": "DOCKER", "docker": { "image": "ciscocloud/mesos-consul", "network": "BRIDGE", "parameters": [ { "key": "rm", "value": "true" } ] } }, "id": "mesos-consul", "args": ["--zk=zk://192.168.33.10:2181/mesos --mesos-ip-order=mesos,host"], "instances": 1, "cpus": 0.1, "mem": 256, "constraints": [["hostname", "CLUSTER", "node1"]] }

These are the logs that I see for the mesos-consul bridge:

vagrant@node1:~/projects/consul$ sudo docker logs 5cd4ec4464d7 2015/11/24 16:26:45 Connected to 192.168.33.10:2181 2015/11/24 16:26:45 Authenticated: id=94921046598942733, timeout=40000

Any clue as to why adding this new flag would cause issues?

ChrisAubuchon commented 8 years ago

The command line arguments in the args list need to be separated:

"args": [
  "--zk=zk://192.168.33.10:2181/mesos",
  "--mesos-ip-order=mesos,host"
  ],
ghost commented 8 years ago

oops! it's now working. thanks Chris.

Out of curiosity, do you work for cisco? what does cisco the company have to do with these projects?

ChrisAubuchon commented 8 years ago

Mesos-consul was developed as part of Cisco's Mantl project

ghost commented 8 years ago

thanks