openshift / openshift-sdn

Apache License 2.0
69 stars 63 forks source link

openshift node service start failed with networkplugin calico #298

Closed dragon9783 closed 8 years ago

dragon9783 commented 8 years ago

[Description]

openshift node service start failed with networkplugin calico

[Steps]

  1. openshift ansible install cluster without sdn plugin
  2. download calico binary file, and move to /usr/libexec/kubernetes/kubelet-plugins/net/exec/
  3. modify /etc/origin/node/node-config.yaml, add "networkPluginName: cni"
  4. systemctl restart origin-node

    [Error log]

Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal origin-node[39720]: I0421 22:39:11.272849   39720 proxier.go:445] OnEndpointsUpdate took 81.284024ms for 1 endpoints
Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal origin-node[39720]: I0421 22:39:11.297265   39720 manager.go:172] Version: {KernelVersion:3.10.0-327.el7.x86_64 ContainerOsVersion:Red Hat Enterprise Linux Server 7.2 (Maipo) DockerVersion:1.9.1 CadvisorVersion: CadvisorRevision:}
Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal origin-node[39720]: I0421 22:39:11.297904   39720 server.go:320] Using root directory: /var/lib/origin/openshift.local.volumes
Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal origin-node[39720]: I0421 22:39:11.297984   39720 server.go:579] Sending events to api server.
Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal origin-node[39720]: I0421 22:39:11.298064   39720 server.go:654] Watching apiserver
Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal origin-node[39720]: F0421 22:39:11.302337   39720 node.go:230] failed to create kubelet: Network plugin "cni" not found.
Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal systemd[1]: origin-node.service: main process exited, code=exited, status=255/n/a
Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal systemd[1]: Unit origin-node.service entered failed state.
Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal systemd[1]: origin-node.service failed.
Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal systemd[1]: origin-node.service holdoff time over, scheduling restart.
Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal systemd[1]: start request repeated too quickly for origin-node.service
Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal systemd[1]: Failed to start Origin Node.
Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal systemd[1]: Unit origin-node.service entered failed state.
Apr 21 22:39:11 ip-172-31-10-20.cn-north-1.compute.internal systemd[1]: origin-node.service failed.

please help me, i don't know how to config correct.

rajatchopra commented 8 years ago

You have to place the files in /opt/cni/bin

dragon9783 commented 8 years ago

Thanks for you answer, @rajatchopra Following your idea, i place the file in /opt/cni/bin and /usr/libexec/kubernetes/kubelet-plugins/net/exec,

[root@ip-172-31-10-20 ~]# ll /opt/cni/bin/calico
-rwxr-xr-x. 1 root root 5312586 Apr 21 05:12 /opt/cni/bin/calico
[root@ip-172-31-10-20 ~]# ll /usr/libexec/kubernetes/kubelet-plugins/net/exec/calico
-rwxr-xr-x. 1 root root 5312586 Dec  9 13:02 /usr/libexec/kubernetes/kubelet-plugins/net/exec/calico

start node service already have a error

Apr 22 10:24:48 ip-172-31-10-20.cn-north-1.compute.internal origin-node[4403]: I0422 10:24:48.068421    4403 server.go:320] Using root directory: /var/lib/origin/openshift.local.vol
Apr 22 10:24:48 ip-172-31-10-20.cn-north-1.compute.internal origin-node[4403]: I0422 10:24:48.068515    4403 server.go:579] Sending events to api server.
Apr 22 10:24:48 ip-172-31-10-20.cn-north-1.compute.internal origin-node[4403]: I0422 10:24:48.068589    4403 server.go:654] Watching apiserver
Apr 22 10:24:48 ip-172-31-10-20.cn-north-1.compute.internal origin-node[4403]: I0422 10:24:48.074119    4403 config.go:278] Setting pods for source api
Apr 22 10:24:48 ip-172-31-10-20.cn-north-1.compute.internal origin-node[4403]: F0422 10:24:48.083020    4403 node.go:230] failed to create kubelet: Network plugin "cni" not found.
Apr 22 10:24:48 ip-172-31-10-20.cn-north-1.compute.internal systemd[1]: origin-node.service: main process exited, code=exited, status=255/n/a

master config yaml

apiLevels:
- v1
apiVersion: v1
assetConfig:
  logoutURL: ""
  masterPublicURL: https://ec2-54-222-197-107.cn-north-1.compute.amazonaws.com.cn:8443
  publicURL: https://ec2-54-222-197-107.cn-north-1.compute.amazonaws.com.cn:8443/console/
  servingInfo:
    bindAddress: 0.0.0.0:8443
    bindNetwork: tcp4
    certFile: master.server.crt
    clientCA: ""
    keyFile: master.server.key
    maxRequestsInFlight: 0
    requestTimeoutSeconds: 0
controllers: '*'
corsAllowedOrigins:
  - 127.0.0.1
  - localhost
  - 172.31.10.20
  - 54.222.197.107
  - kubernetes.default
  - kubernetes.default.svc.cluster.local
  - kubernetes
  - openshift.default
  - openshift.default.svc
  - ec2-54-222-197-107.cn-north-1.compute.amazonaws.com.cn
  - 172.30.0.1
  - openshift.default.svc.cluster.local
  - kubernetes.default.svc
  - ip-172-31-10-20.cn-north-1.compute.internal
  - openshift
dnsConfig:
  bindAddress: 0.0.0.0:53
  bindNetwork: tcp4
etcdClientInfo:
  ca: ca.crt
  certFile: master.etcd-client.crt
  keyFile: master.etcd-client.key
  urls:
    - https://ip-172-31-10-20.cn-north-1.compute.internal:4001
etcdConfig:
  address: ip-172-31-10-20.cn-north-1.compute.internal:4001
  peerAddress: ip-172-31-10-20.cn-north-1.compute.internal:7001
  peerServingInfo:
    bindAddress: 0.0.0.0:7001
    certFile: etcd.server.crt
    clientCA: ca.crt
    keyFile: etcd.server.key
  servingInfo:
    bindAddress: 0.0.0.0:4001
    certFile: etcd.server.crt
    clientCA: ca.crt
    keyFile: etcd.server.key
  storageDirectory: /var/lib/origin/openshift.local.etcd
etcdStorageConfig:
  kubernetesStoragePrefix: kubernetes.io
  kubernetesStorageVersion: v1
  openShiftStoragePrefix: openshift.io
  openShiftStorageVersion: v1
imageConfig:
  format: openshift/origin-${component}:${version}
  latest: false
kind: MasterConfig
kubeletClientInfo:
  ca: ca.crt
  certFile: master.kubelet-client.crt
  keyFile: master.kubelet-client.key
  port: 10250
kubernetesMasterConfig:
  apiServerArguments:
    {}
  controllerArguments:
    {}
  masterCount: 1
  masterIP: 172.31.10.20
  podEvictionTimeout:
  proxyClientInfo:
    certFile: master.proxy-client.crt
    keyFile: master.proxy-client.key
  schedulerConfigFile: /etc/origin/master/scheduler.json
  servicesNodePortRange: ""
  servicesSubnet: 172.30.0.0/16
  staticNodeNames: []
masterClients:
  externalKubernetesKubeConfig: ""
  openshiftLoopbackKubeConfig: openshift-master.kubeconfig
masterPublicURL: https://ec2-54-222-197-107.cn-north-1.compute.amazonaws.com.cn:8443
networkConfig:
  clusterNetworkCIDR: 10.1.0.0/16
  hostSubnetLength: 8
# serviceNetworkCIDR must match kubernetesMasterConfig.servicesSubnet
  serviceNetworkCIDR: 172.30.0.0/16
  networkPluginName: cni
oauthConfig:
  assetPublicURL: https://ec2-54-222-197-107.cn-north-1.compute.amazonaws.com.cn:8443/console/
  grantConfig:
    method: auto
  identityProviders:
  - challenge: true
    login: true
    mappingMethod: claim
    name: htpasswd_auth
    provider:
      apiVersion: v1
      file: /etc/origin/master/htpasswd
      kind: HTPasswdPasswordIdentityProvider
  masterCA: ca.crt
  masterPublicURL: https://ec2-54-222-197-107.cn-north-1.compute.amazonaws.com.cn:8443
  masterURL: https://ip-172-31-10-20.cn-north-1.compute.internal:8443
  sessionConfig:
    sessionMaxAgeSeconds: 3600
    sessionName: ssn
    sessionSecretsFile: /etc/origin/master/session-secrets.yaml
  tokenConfig:
    accessTokenMaxAgeSeconds: 86400
    authorizeTokenMaxAgeSeconds: 500
pauseControllers: false
policyConfig:
  bootstrapPolicyFile: /etc/origin/master/policy.json
  openshiftInfrastructureNamespace: openshift-infra
  openshiftSharedResourcesNamespace: openshift
projectConfig:
  defaultNodeSelector: ""
  projectRequestMessage: ""
  projectRequestTemplate: ""
  securityAllocator:
    mcsAllocatorRange: "s0:/2"
    mcsLabelsPerProject: 5
    uidAllocatorRange: "1000000000-1999999999/10000"
routingConfig:
  subdomain:  ""
serviceAccountConfig:
  limitSecretReferences: false
  managedNames:
  - default
  - builder
  - deployer
  masterCA: ca.crt
  privateKeyFile: serviceaccounts.private.key
  publicKeyFiles:
  - serviceaccounts.public.key
servingInfo:
  bindAddress: 0.0.0.0:8443
  bindNetwork: tcp4
  certFile: master.server.crt
  clientCA: ca.crt
  keyFile: master.server.key
  maxRequestsInFlight: 500
  requestTimeoutSeconds: 3600

node config yaml

allowDisabledDocker: false
apiVersion: v1
dnsDomain: cluster.local
dockerConfig:
  execHandlerName: ""
iptablesSyncPeriod: "5s"
imageConfig:
  format: openshift/origin-${component}:${version}
  latest: false
kind: NodeConfig
kubeletArguments:
  {}
masterKubeConfig: system:node:ip-172-31-10-20.cn-north-1.compute.internal.kubeconfig
# networkConfig struct introduced in origin 1.0.6 and OSE 3.0.2 which
# deprecates networkPluginName above. The two should match.
networkConfig:
   mtu: 8951
   networkPluginName: cni
nodeName: ip-172-31-10-20.cn-north-1.compute.internal
podManifestConfig:
servingInfo:
  bindAddress: 0.0.0.0:10250
  certFile: server.crt
  clientCA: ca.crt
  keyFile: server.key
volumeDirectory: /var/lib/origin/openshift.local.volumes
proxyArguments:
  proxy-mode:
     - iptables
volumeConfig:
  localQuota:
    perFSGroup:

what's missing from config yaml?

rajatchopra commented 8 years ago

You have to create the cni conf file also. You could follow what is done in this issue: https://github.com/openshift/origin/issues/8321#issue-145014089

Check step#3.

dragon9783 commented 8 years ago

Thank you so much