rancher / rancher

Complete container management platform
http://rancher.com
Apache License 2.0
23.32k stars 2.96k forks source link

RTNETLINK answers: Operation not permitted #3142

Closed dmzaytsev closed 8 years ago

dmzaytsev commented 8 years ago

There are two hosts in my environment. I have set load balancer with option "Always run one instance of this container on every host" The load balancer successfully started on the first host. But it's failed on the second host with message "RTNETLINK answers: Operation not permitted" these hosts absolutely identical.

Rancher v0.50.2 Cattle v0.127.0 User Interface v0.75.0 Rancher Compose v0.6.2

docker logs:

INFO: Running /var/lib/cattle/download/agent-instance-startup/agent-instance-startup-1-40090a0051cbcfe987de329f0 1bd52c08158b1990564b419ee71ee9204cf5d8f/apply.sh INFO: Assigning 10.42.129.178/16 to eth0 RTNETLINK answers: Operation not permitted [[scripts.sh:184] scriptenv [[scripts.sh:109] '[' -n https://**.com/v1 ']' [[scripts.sh:110] return [[scripts.sh:186] export CATTLE_HOME=/var/lib/cattle [[scripts.sh:186] CATTLE_HOME=/var/lib/cattle [[scripts.sh:187] export CATTLE_CONFIG_URL=https://__.__.com/v1 [[scripts.sh:187] CATTLE_CONFIG_URL=https://__.__.com/v1 [[scripts.sh:188] export CATTLE_STORAGE_URL=https://__.__.com/v1 [[scripts.sh:188] CATTLE_STORAGE_URL=https://__._.com/v1 [apply.sh:5] read DEV MAC IP [[apply.sh:6] ip link show dev eth0 [[apply.sh:6] awk '{print $2}' [[apply.sh:6] grep link/ether [apply.sh:6] '[' 02:f6:fd:0f:6b:bc '!=' 02:f6:fd:0f:6b:bc ']' [apply.sh:11] ip addr show dev eth0 [apply.sh:11] grep -iq 10.42.129.178/16 [apply.sh:12] info Assigning 10.42.129.178/16 to eth0 [scripts.sh:19] echo INFO: Assigning 10.42.129.178/16 to eth0 INFO: Assigning 10.42.129.178/16 to eth0 [apply.sh:13] ip addr add dev eth0 10.42.129.178/16 RTNETLINK answers: Operation not permitted The system is going down NOW! Sent SIGTERM to all processes Sent SIGKILL to all processes Requesting system reboot

yutian1224 commented 8 years ago

I have the same question as u.

dmzaytsev commented 8 years ago

I just recreated load balancer Right now I have same error on both nodes How it can be fixed?

deniseschannon commented 8 years ago

I have been unable to reproduce.

Can you provide more details on:

  1. What setup is this? Fresh install or upgraded setup? What OS is your hosts?
  2. Can you provide more details on your load balancers? Are both of you selecting (Always run one instance of this container on every host)?

Providing either a snapshot of what you created or a copy of the docker-compose.yml of the load balancer portion would be great.

yutian1224 commented 8 years ago

STEP 1: Added a new stack and started it. image

STEP 2: Add Load Balancer with conf below and start it. image image

STEP 3: Checking 2 Load Balancer container on hosts. one succ but another failed. image image

I think it's because there's no privileges to change MAC in container by agent run.

ibuildthecloud commented 8 years ago

I'll take a look at this tomorrow. You right in that the container doesn't have privileged to change the mac, but I think the issue might be why is the mac different? We set the mac on container create so on boot we shouldn't try to adjust the mac. What version of Docker are you running? docker version; docker info?

yutian1224 commented 8 years ago

docker version: 1.9.1

image

dmzaytsev commented 8 years ago

@deniseschannon

Can you provide more details on:

  1. What setup is this? Fresh install or upgraded setup? What OS is your hosts?

this is fresh install. virtual machine on GCE 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u5 (2015-10-09) x86_64 GNU/Linux

  1. Can you provide more details on your load balancers? Are both of you selecting (Always run one instance of this container on every host)? Providing either a snapshot of what you created or a copy of the docker-compose.yml of the load balancer portion would be great.

I have set jenkins from the catalog and load balancer

docker-compose.yml:

jenkins-plugins:
  labels:
    io.rancher.service.hash: 904a8db3f0345c66251d49fc71e00445cec1dc11
  image: rancher/jenkins-plugins:v0.1.1
jenkins-datavolume:
  labels:
    io.rancher.container.start_once: 'true'
    io.rancher.service.hash: 23b552bbadc507473ac27387b83c68151a5eb500
  entrypoint:
  - /bin/true
  image: jenkins:1.625.2
lb:
  ports:
  - 80:8080
  labels:
    io.rancher.scheduler.global: 'true'
  tty: true
  image: rancher/load-balancer-service
  links:
  - jenkins-primary:jenkins-primary
  stdin_open: true
jenkins-primary:
  ports:
  - 8080:8080/tcp
  labels:
    io.rancher.sidekicks: jenkins-plugins,jenkins-datavolume
    io.rancher.service.hash: 7e4bb2e64078e5bb189752994fd105870948af7a
    io.rancher.container.hostname_override: container_name
  entrypoint:
  - /usr/share/jenkins/rancher/jenkins.sh
  image: jenkins:1.625.2
  volumes_from:
  - jenkins-plugins
  - jenkins-datavolume

rancher-compose.yml :

jenkins-plugins:
  scale: 1
  metadata: &id001
    io.rancher.service.hash: 353441f698f31a149b827cb76e733a7ebb1c39b6
    plugins: |+
      credentials
      greenballs
      git
      junit
      git-client
      github-api
      github-oauth
      github
      plain-credentials
      scm-api
      ssh-credentials
      ssh-slaves
      swarm

jenkins-datavolume:
  scale: 1
  metadata: *id001
lb:
  load_balancer_config:
    haproxy_config: {}
  health_check:
    port: 42
    interval: 2000
    unhealthy_threshold: 3
    healthy_threshold: 2
    response_timeout: 2000
jenkins-primary:
  scale: 1
  metadata: *id001
 sudo docker ps
CONTAINER ID        IMAGE                            COMMAND                  CREATED             STATUS              PORTS                                          NAMES
f849a985094a        rancher/agent-instance:v0.6.0    "/etc/init.d/agent-in"   12 minutes ago      Up 12 minutes                                                      90678e79-5d8c-4785-a832-3d4368de6a96
01adbf0507d0        jenkins:1.625.2                  "/usr/share/jenkins/r"   23 minutes ago      Up 23 minutes       8080/tcp, 50000/tcp                            d289a50e-9853-4297-a5dd-ea3f495e059b
7937cf9bf1d8        rancher/jenkins-plugins:v0.1.1   "/confd --backend ran"   24 minutes ago      Up 24 minutes                                                      4cd6fae4-0585-4168-82fa-b4d88027db60
5b9e748645a7        rancher/agent-instance:v0.6.0    "/etc/init.d/agent-in"   34 hours ago        Up 34 hours         0.0.0.0:500->500/udp, 0.0.0.0:4500->4500/udp   bc6a849e-692c-4aa6-9d71-3266b3c483fe
b081d3183e99        rancher/agent:v0.8.2             "/run.sh run"            5 days ago          Up 5 days                                                          rancher-agent

I expect to see a few listen tcp ports: 80, 8080, 50000 but I don't see the ports

netstat -na | grep LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.1:9344          0.0.0.0:*               LISTEN     
tcp6       0      0 :::22                   :::*                    LISTEN  
$ docker --version
Docker version 1.9.1, build a34a1d5

$ sudo docker info
Containers: 8
Images: 93
Server Version: 1.9.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 109
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
CPUs: 2
Total Memory: 7.324 GiB
Name: paas-host-1
ID: D6J3:QIHR:557Q:DGOF:L4CA:TR2T:5IXZ:RMGN:JUHZ:J7OZ:GZQA:CTRG
WARNING: No memory limit support
WARNING: No swap limit support
ibuildthecloud commented 8 years ago

@dmzaytsev Whether you see the ports in docker ps depends on the version of Rancher you are running. In the latest version, v0.51.0 the ports should be registered. Are you running v0.51.0?

dmzaytsev commented 8 years ago

@deniseschannon log from jenkins

$ sudo docker logs 01adbf0507d0
Downloading credentials:latest
Downloading greenballs:latest
Downloading git:latest
Downloading junit:latest
Downloading git-client:latest
Downloading github-api:latest
Downloading github-oauth:latest
Downloading github:latest
Downloading plain-credentials:latest
Downloading scm-api:latest
Downloading ssh-credentials:latest
Downloading ssh-slaves:latest
Downloading swarm:latest
Running from: /usr/share/jenkins/jenkins.war
webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
Dec 29, 2015 4:04:24 PM winstone.Logger logInternal
INFO: Beginning extraction from war file
Dec 29, 2015 4:04:25 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: jetty-winstone-2.8
Dec 29, 2015 4:04:26 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: NO JSP Support for , did not find org.apache.jasper.servlet.JspServlet
Jenkins home directory: /var/jenkins_home found at: EnvVars.masterEnvVars.get("JENKINS_HOME")
Dec 29, 2015 4:04:27 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started SelectChannelConnector@0.0.0.0:8080
Dec 29, 2015 4:04:27 PM winstone.Logger logInternal
INFO: Winstone Servlet Engine v2.0 running: controlPort=disabled
Dec 29, 2015 4:04:27 PM jenkins.InitReactorRunner$1 onAttained
INFO: Started initialization
Dec 29, 2015 4:04:36 PM jenkins.InitReactorRunner$1 onAttained
INFO: Listed all plugins
Dec 29, 2015 4:04:36 PM jenkins.InitReactorRunner$1 onTaskFailed
SEVERE: Failed Loading plugin github
java.io.IOException: Dependency token-macro (1.11) doesn't exist
        at hudson.PluginWrapper.resolvePluginDependencies(PluginWrapper.java:480)
        at hudson.PluginManager$2$1$1.run(PluginManager.java:370)
        at org.jvnet.hudson.reactor.TaskGraphBuilder$TaskImpl.run(TaskGraphBuilder.java:169)
        at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:282)
        at jenkins.model.Jenkins$7.runTask(Jenkins.java:905)
        at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:210)
        at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Dec 29, 2015 4:04:36 PM jenkins.InitReactorRunner$1 onAttained
INFO: Prepared all plugins
Dec 29, 2015 4:04:36 PM jenkins.InitReactorRunner$1 onAttained
INFO: Started all plugins
Dec 29, 2015 4:04:36 PM jenkins.InitReactorRunner$1 onAttained
INFO: Augmented all extensions
Dec 29, 2015 4:04:41 PM jenkins.InitReactorRunner$1 onAttained
INFO: Loaded all jobs
Dec 29, 2015 4:04:41 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Started Download metadata
Dec 29, 2015 4:04:41 PM jenkins.util.groovy.GroovyHookScript execute
INFO: Executing /var/jenkins_home/init.groovy.d/tcp-slave-agent-port.groovy
Dec 29, 2015 4:04:41 PM org.jenkinsci.main.modules.sshd.SSHD start
INFO: Started SSHD at port 59404
Dec 29, 2015 4:04:41 PM jenkins.InitReactorRunner$1 onAttained
INFO: Completed initialization
Dec 29, 2015 4:04:42 PM hudson.WebAppMain$3 run
INFO: Jenkins is fully up and running
Dec 29, 2015 4:04:43 PM hudson.model.UpdateSite updateData
INFO: Obtained the latest update center data file for UpdateSource default
Dec 29, 2015 4:04:43 PM hudson.model.DownloadService$Downloadable load
INFO: Obtained the updated data file for hudson.tasks.Maven.MavenInstaller
Dec 29, 2015 4:04:43 PM hudson.model.DownloadService$Downloadable load
INFO: Obtained the updated data file for hudson.tasks.Ant.AntInstaller
Dec 29, 2015 4:04:44 PM hudson.model.DownloadService$Downloadable load
INFO: Obtained the updated data file for hudson.tools.JDKInstaller
Dec 29, 2015 4:04:44 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished Download metadata. 2,962 ms
--> setting agent port for jnlp
--> setting agent port for jnlp... done

log from load balancer

$ sudo docker logs f849a985094a
INFO: Downloading agent https://**/v1/configcontent/configscripts
INFO: Updating configscripts
INFO: Downloading https://**/v1//configcontent//configscripts current=
INFO: Running /var/lib/cattle/download/configscripts/configscripts-1-f0f3fb2e1110b5ada7c441705981f93a480313a324294321cff467f0c3e12319/apply.sh
INFO: Sending configscripts applied 1-f0f3fb2e1110b5ada7c441705981f93a480313a324294321cff467f0c3e12319
INFO: Updating agent-instance-startup
INFO: Downloading https://**/v1//configcontent//agent-instance-startup current=
INFO: Running /var/lib/cattle/download/agent-instance-startup/agent-instance-startup-1-40090a0051cbcfe987de329f01bd52c08158b1990564b419ee71ee9204cf5d8f/apply.sh
INFO: Assigning 10.42.200.58/16 to eth0
RTNETLINK answers: Operation not permitted
[[scripts.sh:184] script_env
[[scripts.sh:109] '[' -n https://**/v1 ']'
[[scripts.sh:110] return
[[scripts.sh:186] export CATTLE_HOME=/var/lib/cattle
[[scripts.sh:186] CATTLE_HOME=/var/lib/cattle
[[scripts.sh:187] export CATTLE_CONFIG_URL=https://**/v1
[[scripts.sh:187] CATTLE_CONFIG_URL=https://**/v1
[[scripts.sh:188] export CATTLE_STORAGE_URL=https://**/v1
[[scripts.sh:188] CATTLE_STORAGE_URL=https://**/v1
[apply.sh:5] read DEV MAC IP
[[apply.sh:6] awk '{print $2}'
[[apply.sh:6] grep link/ether
[[apply.sh:6] ip link show dev eth0
[apply.sh:6] '[' 02:f6:fd:d8:8e:81 '!=' 02:f6:fd:d8:8e:81 ']'
[apply.sh:11] grep -iq 10.42.200.58/16
[apply.sh:11] ip addr show dev eth0
[apply.sh:12] info Assigning 10.42.200.58/16 to eth0
[scripts.sh:19] echo INFO: Assigning 10.42.200.58/16 to eth0
INFO: Assigning 10.42.200.58/16 to eth0
[apply.sh:13] ip addr add dev eth0 10.42.200.58/16
RTNETLINK answers: Operation not permitted
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system reboot
dmzaytsev commented 8 years ago

@deniseschannon Component Version Rancher v0.50.2 Cattle v0.127.0 User Interface v0.75.0 Rancher Compose v0.6.2

ibuildthecloud commented 8 years ago

@dmzaytsev Sorry, I failed to realize you already shared that. Thanks

dmzaytsev commented 8 years ago

@deniseschannon no problem

dmzaytsev commented 8 years ago

@ibuildthecloud sorry :)

@deniseschannon no problem

yutian1224 commented 8 years ago

Is there any progress? :)

id-romain commented 8 years ago

I had the same problem when my websocket-proxy was not configured and working properly. Maybe it can help...

dmzaytsev commented 8 years ago

yep, the problem was gone when I switched back to standalone Rancher

deniseschannon commented 8 years ago

@dmzaytsev You mention it works when you have standalone Rancher. Can you describe what your previous set up was?

dmzaytsev commented 8 years ago

@deniseschannon I have set a cluster on previous step http://docs.rancher.com/rancher/installing-rancher/installing-server/multi-nodes/ the problem was gone when I run only one node and remove env variables CATTLE_HOST_API_PROXY_MODE, CATTLE_HOST_API_PROXY_HOST but rancher still using external zookeper, redis and mysql Just these two variables has been removed

deniseschannon commented 8 years ago

@yutian1224 @id-romain Did you guys also have a multi-node setup when hitting this issue?

If you use standalone Rancher, do you have this issue?

id-romain commented 8 years ago

Yes, it was a multi-node setup, but the problem was caused by a wrong configuration, I already reported about it here: #3163

Wrong configuration with quotes:

This configuration works, and I have seen no problem anymore with this configuration, in a multi-node setup:

brunopiresr commented 8 years ago

Hi @deniseschannon, I'm having this issue in singlenode Rancher server.

deniseschannon commented 8 years ago

I haven't had time to setup a multi-node setup to reproduce the issue.

@brunopiresr What is your OS, docker version?

deniseschannon commented 8 years ago

@dmzaytsev Did you see the comments above about how the configuration for those env variables were updated? That could fix your multi-node set up.

Wrong configuration with quotes:

CATTLE_HOST_API_PROXY_MODE="ha" CATTLE_HOST_API_PROXY_HOST="146.148.20.18:9999"

This configuration works, and I have seen no problem anymore with this configuration, in a multi-node setup:

CATTLE_HOST_API_PROXY_MODE=ha CATTLE_HOST_API_PROXY_HOST=146.148.20.18:9999

dmzaytsev commented 8 years ago

@deniseschannon thanks! I will try also I think Rancher should report about wrong configuration