networkop / k8s-topo

Topology builder for network simulations inside K8S
BSD 3-Clause "New" or "Revised" License
72 stars 18 forks source link

issue with 4.21f ceos #10

Open hyson007 opened 4 years ago

hyson007 commented 4 years ago

getting below error when start pod OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"Cli\": executable file not found in $PATH": unknown from arista recent readme of ceos-lab, it seems there is need to pass system some systemd.setenv arg along with /sbin/init but looking at Class CEOS, it seems only environment variables are passed. (I tried to concat in self.command but it doesn't work)

create docker instances with needed environment variables docker create --name=ceos1 --privileged -e INTFTYPE=eth -e ETBA=1 -e SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT=1 -e CEOS=1 -e EOS_PLATFORM=ceoslab -e container=docker -i -t ceosimage:4.21.0F /sbin/init systemd.setenv=INTFTYPE=eth systemd.setenv=ETBA=1 systemd.setenv=SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT=1 systemd.setenv=CEOS=1 systemd.setenv=EOS_PLATFORM=ceoslab systemd.setenv=container=docker

hyson007 commented 4 years ago

update self.command in class CEOS to the following resolved the issue to me.

['/sbin/init', 'systemd.setenv=INTFTYPE=eth', 'systemd.setenv=ETBA=1', 'systemd.setenv=SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT=1', 'systemd.setenv=CEOS=1', 'systemd.setenv=EOS_PLATFORM=ceoslab', 'systemd.setenv=container=docker']

networkop commented 4 years ago

sounds right. do you want to do a pull request?

hyson007 commented 4 years ago

it seems there are still some issue with dynamic routing protocol, unable to bring up ospf, i'm working with arista TAC in case 199976 , will update here once I have more info.

sw-1(config-router-ospf)#end sw-1#sh ip os nei

% Internal error % To see the details of this error, run the command 'show error 1' sw-1#sh ip os nei ! OSPF inactive sw-1#sh ip os nei ! OSPF inactive sw-1#sh ip os nei

% Internal error % To see the details of this error, run the command 'show error 2'

networkop commented 4 years ago

i think this is because you need to have at least one ethernet interface in up/up state.

hyson007 commented 4 years ago

nope, i do have L3 interface up/up and can ping each other but ospf can't be brought up, show logging says rib is continuously crashing. TAC is able to reproduce the issue and ospf works when they use svi, they claim it's this bug causing issue,

BUG397410 affects all EOS versions. Kernel interfaces are the interfaces on which the VMs are installed.

Our Engineering team is working on this bug fix. As of now the work around would be to create an SVI, and have Ospf neighborship on a SVI instead of an ethernet interface.

but it seems no such issue on the old version, 4.20.5F, i will update once i have more info.

hyson007 commented 4 years ago

( i did encounter the scenario you mentioned when no ethernet interface in ceos is showing up, in that case i can't even enable 'ip routing', but this time it seems different, i can enable 'ip routing' at least )

vparames86 commented 4 years ago

I updated the self.command, but still getting the issue for version 4.22.1F

kubectl exec -it arista01-5f4dcbdf77-99h9x Cli Defaulting container name to router. Use 'kubectl describe pod/arista01-5f4dcbdf77-99h9x -n default' to see all of the containers in this pod. OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"Cli\": executable file not found in $PATH": unknown command terminated with exit code 126

networkop commented 4 years ago

you can check that command over here https://github.com/networkop/docker-topo/blob/master/bin/docker-topo#L416

vparames86 commented 4 years ago

@networkop - Still getting this issue while running it in a k8s cluster. Have no issues when I launch it as separate docker container. I tried different arista ceos images and all have prb when launched in K8s cluster. I could get to the bash but not Cli. I did "ps -ef" to check all processes running after logging in to bash but see no process running. But in the one I launched as separate docker container, I could see all the processes running.

bash-4.3# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 10:38 ? 00:00:00 /sbin/init systemd.setenv=INTFTYPE=eth systemd.setenv=ETBA=1 systemd.setenv=SKIP_ZEROTOUCH_BARRIER_IN_S root 6 0 0 10:39 pts/0 00:00:00 bash root 14 6 0 10:40 pts/0 00:00:00 ps -ef

kubectl describe pod arista05-bb8dcbf6b-mkn7m Name: arista05-bb8dcbf6b-mkn7m Namespace: default Priority: 0 Node: k8s-agentpool-24376997-vmss000004/10.240.0.125 Start Time: Thu, 02 Apr 2020 03:38:06 -0700 Labels: app=aristatopo03 device=arista05 pod-template-hash=bb8dcbf6b Annotations: kubernetes.io/psp: privileged Status: Running IP: 10.240.0.127 IPs: IP: 10.240.0.127 Controlled By: ReplicaSet/arista05-bb8dcbf6b Containers: router: Container ID: docker://285f718f4a04add8a9ce74fce60ad2ebea26081eaede75578ce5e9dd24603b82 Image: ccevirtnetpperegistry.azurecr.io/ceosimage:4.21.10M Image ID: docker-pullable://ccevirtnetpperegistry.azurecr.io/ceosimage@sha256:9c1867f3e5f2e539f2a521f4ba443906ec2e6b5972cb6dc1b1e6faa902efe977 Port: Host Port: Command: /sbin/init systemd.setenv=INTFTYPE=eth systemd.setenv=ETBA=1 systemd.setenv=SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT=1 systemd.setenv=CEOS=1 systemd.setenv=container=docker systemd.setenv=EOS_PLATFORM=ceoslab State: Running Started: Thu, 02 Apr 2020 03:38:09 -0700 Ready: True Restart Count: 0 Limits: cpu: 2 Requests: cpu: 1 memory: 2Gi Environment: CEOS: 1 EOS_PLATFORM: ceoslab container: docker ETBA: 1 SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT: 1 INTFTYPE: eth Mounts: /mnt/azure from startup-config-volume (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-wrqkj (ro) nse-sidecar: Container ID: docker://8bda0802eed6ee652aaf48cd2181f4d60a31c5972760181ba2b0f3204fac494b Image: networkservicemesh/topology-sidecar-nse:master Image ID: docker-pullable://networkservicemesh/topology-sidecar-nse@sha256:e7a949655cf3759e10fd777c7e9e367640276b871f465bc80c9aabdfd95bf1f7 Port: Host Port: State: Running Started: Thu, 02 Apr 2020 03:38:10 -0700 Ready: True Restart Count: 0 Limits: networkservicemesh.io/socket: 1 Requests: networkservicemesh.io/socket: 1 Environment: ENDPOINT_NETWORK_SERVICE: aristatopo03 ENDPOINT_LABELS: device=arista05 IP_ADDRESS: 10.60.17.48/28 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-wrqkj (ro) nsc-sidecar: Container ID: docker://574988ff764754c5917a45b4f07a8f1de6fd0d3cbea4d6e97414de758083c67c Image: networkservicemesh/topology-sidecar-nsc:master Image ID: docker-pullable://networkservicemesh/topology-sidecar-nsc@sha256:2b953a76bb548da60313ba5d51973772b61e7096be62722067c019f2aae62934 Port: Host Port: State: Running Started: Thu, 02 Apr 2020 03:38:11 -0700 Ready: True Restart Count: 0 Limits: networkservicemesh.io/socket: 1 Requests: networkservicemesh.io/socket: 1 Environment: NS_NETWORKSERVICEMESH_IO: aristatopo03/eth1?link=net-84&peerif=eth2 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-wrqkj (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: startup-config-volume: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: arista05-pvc ReadOnly: false default-token-wrqkj: Type: Secret (a volume populated by a Secret) SecretName: default-token-wrqkj Optional: false QoS Class: Burstable Node-Selectors: Tolerations: networkservicemesh.io/socket:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message


Warning FailedScheduling default-scheduler running "VolumeBinding" filter plugin for pod "arista05-bb8dcbf6b-mkn7m": pod has unbound immediate PersistentVolumeClaims Normal Scheduled default-scheduler Successfully assigned default/arista05-bb8dcbf6b-mkn7m to k8s-agentpool-24376997-vmss000004 Normal Pulling 3m53s kubelet, k8s-agentpool-24376997-vmss000004 Pulling image "ccevirtnetpperegistry.azurecr.io/ceosimage:4.21.10M" Normal Pulled 3m52s kubelet, k8s-agentpool-24376997-vmss000004 Successfully pulled image "ccevirtnetpperegistry.azurecr.io/ceosimage:4.21.10M" Normal Created 3m51s kubelet, k8s-agentpool-24376997-vmss000004 Created container router Normal Started 3m51s kubelet, k8s-agentpool-24376997-vmss000004 Started container router Normal Pulling 3m51s kubelet, k8s-agentpool-24376997-vmss000004 Pulling image "networkservicemesh/topology-sidecar-nse:master" Normal Pulled 3m50s kubelet, k8s-agentpool-24376997-vmss000004 Successfully pulled image "networkservicemesh/topology-sidecar-nse:master" Normal Created 3m50s kubelet, k8s-agentpool-24376997-vmss000004 Created container nse-sidecar Normal Started 3m50s kubelet, k8s-agentpool-24376997-vmss000004 Started container nse-sidecar Normal Pulling 3m50s kubelet, k8s-agentpool-24376997-vmss000004 Pulling image "networkservicemesh/topology-sidecar-nsc:master" Normal Pulled 3m49s kubelet, k8s-agentpool-24376997-vmss000004 Successfully pulled image "networkservicemesh/topology-sidecar-nsc:master" Normal Created 3m49s kubelet, k8s-agentpool-24376997-vmss000004 Created container nsc-sidecar Normal Started 3m49s kubelet, k8s-agentpool-24376997-vmss000004 Started container nsc-sidecar

networkop commented 4 years ago

I can't see where the error is. @vparames86 can you try launching it as a standalone pod, i.e. outside of k8s-topo?

vparames86 commented 4 years ago

@networkop - Even the standalone pod doesn't seem to work for me. This is the yaml I used. I put all the vars in COMMANDS and also tried putting the remaining ones other than /sbin/init under ARGS but doesn't seem to work.

apiVersion: v1 kind: Pod metadata: name: arista101 namespace: default spec: containers:

Could you please share a pod.yaml that works for you?

networkop commented 4 years ago

this one worked for me

apiVersion: v1
kind: Pod
metadata:
  name: ceos
spec:
  containers:
  - image: ceos:4.23.2F
    name: ceos
    securityContext:
        privileged: true
        capabilities:
            add:
            - NET_ADMIN
    command: 
    - "/sbin/init"
    args:
    - "systemd.setenv=INTFTYPE=eth"
    - "systemd.setenv=ETBA=1" 
    - "systemd.setenv=SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT=1"
    - "systemd.setenv=CEOS=1"
    - "systemd.setenv=container=docker"
    - "systemd.setenv=EOS_PLATFORM=ceoslab"
    env: 
    - name: CEOS
      value: "1"
    - name: EOS_PLATFORM
      value: "ceoslab"
    - name: container
      value: docker
    - name: SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT
      value: "1"
    - name: INTFTYPE
      value: eth
vparames86 commented 4 years ago

ah sec_context = client.V1SecurityContext(privileged=True) this is missing for create_nsm function. This most probably might be the issue.

vparames86 commented 4 years ago

@networkop - This worked thanks for your help