skydive-project / skydive

An open source real-time network topology and protocols analyzer
https://skydive.network
Apache License 2.0
2.65k stars 400 forks source link

Support for Cri-o #1391

Closed SchSeba closed 5 years ago

SchSeba commented 5 years ago

Hello @safchain,

I will like to know if the project intend to add support for cri-o engine and not only docker.

safchain commented 5 years ago

@SchSeba I currently investigating what is the effort to support cri-o engine. I'll make an update soon

safchain commented 5 years ago

I submitted a WIP patch that add the support for runc. Tested on a minikube cri-o env

SchSeba commented 5 years ago

Thanks for the update if there is anything I can help you with just tell me.

safchain commented 5 years ago

@SchSeba we just released a new version of Skydive 0.21, with the support of podman/runc which should work with cri-o too

https://hub.docker.com/r/skydive/skydive/tags/

Could you have a look ?

SchSeba commented 5 years ago

@safchain Sure I will thanks!

SchSeba commented 5 years ago

Hi @safchain I want to update you that this is not working for me.

I need to update the openshift deployment to change the config on the agents.

agent:
  topology:
    probes:
      - ovsdb
      - docker
      - runc
analyzer:
  listen: 0.0.0.0:8082

but I still don't see the pods or containers in the UI.

deployment file

apiVersion: v1
kind: Template
metadata:
  name: skydive
objects:
  - apiVersion: v1
    kind: ConfigMap
    metadata:
      labels:
        app: skydive-analyzer
      name: skydive-analyzer-config
    data:
      SKYDIVE_ANALYZER_FLOW_BACKEND: elasticsearch
      SKYDIVE_ANALYZER_TOPOLOGY_BACKEND: elasticsearch
      SKYDIVE_ANALYZER_TOPOLOGY_PROBES: ""
      SKYDIVE_ETCD_LISTEN: 0.0.0.0:12379
  - apiVersion: v1
    data:
      skydive.yml: |
        agent:
          topology:
            probes:
              - ovsdb
              - docker
              - runc
        analyzer:
          listen: 0.0.0.0:8082
    kind: ConfigMap
    metadata:
      labels:
        app: skydive-agent
      name: skydive-agent-config
  - apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: skydive-analyzer
      name: skydive-analyzer
    spec:
      ports:
        - name: api
          port: 8082
          protocol: TCP
          targetPort: 8082
        - name: protobuf
          port: 8082
          protocol: UDP
          targetPort: 8082
        - name: etcd
          port: 12379
          protocol: TCP
          targetPort: 12379
        - name: etcd-cluster
          port: 12380
          protocol: TCP
          targetPort: 12380
        - name: es
          port: 9200
          protocol: TCP
          targetPort: 9200
      selector:
        app: skydive
        tier: analyzer
      sessionAffinity: None
      type: NodePort
  - apiVersion: v1
    kind: DeploymentConfig
    metadata:
      name: skydive-analyzer
    spec:
      replicas: 1
      selector:
        app: skydive
        tier: analyzer
      strategy:
        rollingParams:
          intervalSeconds: 1
          maxSurge: 25%
          maxUnavailable: 25%
          timeoutSeconds: 600
          updatePeriodSeconds: 1
        type: Rolling
      template:
        metadata:
          labels:
            app: skydive
            tier: analyzer
        spec:
          containers:
            - args:
                - analyzer
                - --listen=0.0.0.0:8082
              envFrom:
                - configMapRef:
                    name: skydive-analyzer-config
              image: skydive/skydive
              imagePullPolicy: Always
              livenessProbe:
                failureThreshold: 3
                tcpSocket:
                  port: 8082
                initialDelaySeconds: 30
                periodSeconds: 10
                successThreshold: 1
                timeoutSeconds: 5
              name: skydive-analyzer
              ports:
                - containerPort: 8082
                  protocol: TCP
                - containerPort: 8082
                  protocol: UDP
                - containerPort: 12379
                  protocol: TCP
                - containerPort: 12380
                  protocol: TCP
              readinessProbe:
                failureThreshold: 1
                tcpSocket:
                  port: 8082
                initialDelaySeconds: 30
                periodSeconds: 10
                successThreshold: 1
                timeoutSeconds: 5
            - image: elasticsearch:5
              imagePullPolicy: IfNotPresent
              livenessProbe:
                failureThreshold: 3
                tcpSocket:
                  port: 9200
                initialDelaySeconds: 30
                periodSeconds: 10
                successThreshold: 1
                timeoutSeconds: 5
              name: skydive-elasticsearch
              ports:
                - containerPort: 9200
                  protocol: TCP
              readinessProbe:
                failureThreshold: 1
                tcpSocket:
                  port: 9200
                initialDelaySeconds: 30
                periodSeconds: 10
                successThreshold: 1
                timeoutSeconds: 5
              securityContext:
                privileged: true
          dnsPolicy: ClusterFirst
          restartPolicy: Always
          terminationGracePeriodSeconds: 30
      test: false
      triggers:
        - type: ConfigChange
  - apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
      labels:
        app: skydive
        tier: agent
      name: skydive-agent
    spec:
      selector:
        matchLabels:
          app: skydive
          tier: agent
      template:
        metadata:
          labels:
            app: skydive
            tier: agent
        spec:
          containers:
            - args:
                - agent
              env:
                - name: SKYDIVE_ANALYZERS
                  value: $(SKYDIVE_ANALYZER_SERVICE_HOST):$(SKYDIVE_ANALYZER_SERVICE_PORT_API)
              envFrom:
                - configMapRef:
                    name: skydive-agent-config
              image: skydive/skydive
              imagePullPolicy: Always
              name: skydive-agent
              ports:
                - containerPort: 8081
                  hostPort: 8081
                  protocol: TCP
              securityContext:
                privileged: true
              volumeMounts:
                - mountPath: /var/run/docker.sock
                  name: docker
                - mountPath: /host/run
                  name: run
                - mountPath: /var/run/openvswitch/db.sock
                  name: ovsdb
                - mountPath: /var/run/runc
                  name: crio
                - name: agent-config
                  mountPath: /etc/skydive.yml
                  subPath: skydive.yml
          dnsPolicy: ClusterFirst
          hostNetwork: true
          hostPID: true
          restartPolicy: Always
          terminationGracePeriodSeconds: 30
          volumes:
            - name: agent-config
              configMap:
                name: skydive-agent-config
            - hostPath:
                path: /var/run/docker.sock
              name: docker
            - hostPath:
                path: /var/run/netns
              name: run
            - hostPath:
                path: /var/run/openvswitch/db.sock
              name: ovsdb
            - hostPath:
                path: /var/run/crio/
              name: crio
  - apiVersion: v1
    kind: Route
    metadata:
      labels:
        app: skydive-analyzer
      name: skydive-analyzer
    spec:
      port:
        targetPort: api
      to:
        kind: Service
        name: skydive-analyzer
        weight: 100
      wildcardPolicy: None

agent log:

2018-12-10T12:35:19.269Z    INFO    agent/agent.go:46 glob..func1   cnv-executor-myakove-node1.example.com: Skydive Agent 0.21.0-b107adc75b5c starting...
2018-12-10T12:35:19.269Z    INFO    http/server.go:109 (*Server).Listen cnv-executor-myakove-node1.example.com: Listening on socket 127.0.0.1:8081
2018-12-10T12:35:19.280Z    INFO    agent/probes.go:49 NewTopologyProbeBundleFromConfig cnv-executor-myakove-node1.example.com: Topology probes: [ovsdb docker runc]
2018-12-10T12:35:19.281Z    INFO    probes/probes.go:67 NewFlowProbeBundle  cnv-executor-myakove-node1.example.com: Flow probes: [pcapsocket ovssflow sflow gopacket dpdk ebpf ovsmirror]
2018-12-10T12:35:19.281Z    INFO    probes/probes.go:117 NewFlowProbeBundle cnv-executor-myakove-node1.example.com: Not compiled with dpdk support, skipping it
2018-12-10T12:35:19.413Z    ERROR   probes/probes.go:115 NewFlowProbeBundle cnv-executor-myakove-node1.example.com: Failed to create ebpf probe: Unable to load eBPF elf binary (host amd64) from binda
ta: error while loading "socket_flow_table" (invalid argument)
2018-12-10T12:35:19.420Z    INFO    probes/ovsmirror.go:427 (*OvsMirrorProbesHandler).cleanupOvsMirrors cnv-executor-myakove-node1.example.com: OvsMirror cleanup previous mirrors
2018-12-10T12:35:19.429Z    INFO    ovs/ovsdb.go:311 (*OvsMonitor).portAdded    cnv-executor-myakove-node1.example.com: New port "veth9a6b1551(1532eb5d-d3bb-4614-8ff5-34757322fa04)" added
2018-12-10T12:35:19.429Z    INFO    ovs/ovsdb.go:311 (*OvsMonitor).portAdded    cnv-executor-myakove-node1.example.com: New port "veth0909f89e(300b5284-12f2-4f68-9f6a-3993e047d009)" added
...........

please tell me if you need something else.

safchain commented 5 years ago

Can you please point me a documentation in order to install a similar environment ?

SchSeba commented 5 years ago

Hi @safchain,

this is what I did to deploy the environment to check the PR

git clone https://github.com/openshift/openshift-ansible.git -b v3.11.0 --depth 1 overwrite this : <master_ip>

all:
  vars:
    openshift_use_crio: 'true'
    olm_operator_image: quay.io/coreos/olm:master-08ea39b7
    olm_catalog_operator_image: quay.io/coreos/catalog:master-57dd618d
  children:
    OSEv3:
      hosts:
        node01:
          openshift_ip: <master_ip>
          openshift_node_group_name: node-config-master-infra-kubevirt
          openshift_schedulable: true
      children:
        masters:
          hosts:
            <master_ip>:
        nodes:
          hosts:
            <master_ip>:
        nfs:
          hosts:
            <master_ip>:
        etcd:
          hosts:
            <master_ip>:
      vars:
        ansible_service_broker_registry_whitelist:
        - .*-apb$
        ansible_service_broker_image: docker.io/ansibleplaybookbundle/origin-ansible-service-broker:ansible-service-broker-1.2.17-1
        ansible_ssh_pass: vagrant
        ansible_ssh_user: root
        deployment_type: origin
        openshift_clock_enabled: true
        openshift_deployment_type: origin
        openshift_disable_check: memory_availability,disk_availability,docker_storage,package_availability,docker_image_availability
        openshift_hosted_etcd_storage_access_modes:
        - ReadWriteOnce
        openshift_hosted_etcd_storage_kind: nfs
        openshift_hosted_etcd_storage_labels:
          storage: etcd
        openshift_hosted_etcd_storage_nfs_directory: /opt/etcd-vol
        openshift_hosted_etcd_storage_nfs_options: '*(rw,root_squash,sync,no_wdelay)'
        openshift_hosted_etcd_storage_volume_name: etcd-vol
        openshift_hosted_etcd_storage_volume_size: 1G
        openshift_image_tag: v3.11.0
        openshift_master_admission_plugin_config:
          MutatingAdmissionWebhook:
            configuration:
              apiVersion: v1
              disable: false
              kind: DefaultAdmissionConfig
          ValidatingAdmissionWebhook:
            configuration:
              apiVersion: v1
              disable: false
              kind: DefaultAdmissionConfig
        openshift_master_identity_providers:
        - challenge: 'true'
          kind: AllowAllPasswordIdentityProvider
          login: 'true'
          name: allow_all_auth
        osm_api_server_args:
          feature-gates:
          - BlockVolume=true
        osm_controller_args:
          feature-gates:
          - BlockVolume=true
        openshift_node_groups:
        - name: node-config-master-infra-kubevirt
          labels:
          - node-role.kubernetes.io/master=true
          - node-role.kubernetes.io/infra=true
          - node-role.kubernetes.io/compute=true
          edits:
          - key: kubeletArguments.feature-gates
            value:
            - RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true,BlockVolume=true
          - key: kubeletArguments.max-pods
            value:
            - '40'
          - key: kubeletArguments.pods-per-core
            value:
            - '40'
        - name: node-config-compute-kubevirt
          labels:
          - node-role.kubernetes.io/compute=true
          edits:
          - key: kubeletArguments.feature-gates
            value:
            - RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true,BlockVolume=true,CPUManager=true
          - key: kubeletArguments.cpu-manager-policy
            value:
            - static
          - key: kubeletArguments.system-reserved
            value:
            - cpu=500m
          - key: kubeletArguments.kube-reserved
            value:
            - cpu=500m
          - key: kubeletArguments.max-pods
            value:
            - '40'
          - key: kubeletArguments.pods-per-core
            value:
            - '40'

then

ansible-playbook -e "ansible_user=root ansible_ssh_pass=vagrant" -i inventory playbooks/prerequisites.yml
ansible-playbook -i inventory playbooks/deploy_cluster.yml
safchain commented 5 years ago

I tried and I had it working (maybe not as expected :) can you check that you don't have any node with Manager : runc in metadata ?

SchSeba commented 5 years ago

Hi @safchain oc get node -o yaml | grep Manager I don't get anything.

What you mean about "maybe not as expected" I think It will be the same as with docker from the UI perspective. Because right now I just get all the veth on the host namespace connected to the ovs bridge but not veth in the container namespace.

safchain commented 5 years ago

Hi @SchSeba could you share a screenshot ? I mean checking that there is no Skydive node(on WebUI) with Manager: runc metadata.

SchSeba commented 5 years ago

@safchain sorry I maybe don't understand you.

here is the metadata from the WebUI

CPU :
Hostname : node1.example.com
KernelVersion : 3.10.0-957.el7.x86_64
Name : node1.example.com
OS : linux
Platform : ubuntu
PlatformFamily : debian
PlatformVersion : 18.04
TID : 420f69f0-573b-5bc1-7435-6d180be6a48b
Type : host
VirtualizationRole : host
VirtualizationSystem : kvm

One thing this server is not ubuntu

NAME="Red Hat Enterprise Linux Server"
VERSION="7.6 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.6"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.6 (Maipo)"

Please tell me if you need something else.

safchain commented 5 years ago

@SchSeba Here what I got with the inventory you gave me. I'm able to see container/namespace + veth. Can you check the version of Skydive (header bar on the WebUI) runc

safchain commented 5 years ago

@SchSeba With the ovsdb probe, that is better :) runc

SchSeba commented 5 years ago

@safchain Amazing!

but this is what I see screenshot from 2018-12-13 21-13-53

Can you maybe share with me your yaml's and deployment instruction of the skydive? I also saw that the version is not exactly the same this can be the problem?

safchain commented 5 years ago

@SchSeba I didn't use a yaml file I just deployed skydive by hand. I tried with the yaml file you provided but got permission issues(privileged container). I think that the runc probe is not properly set in the config file/yaml. I'll try to fix my permission issues. There is no version issue, you just need a >= 0.21 version

SchSeba commented 5 years ago

@safchain can you try to deploy it with https://github.com/skydive-project/skydive/tree/master/contrib/openshift this is what I used.

safchain commented 5 years ago

@SchSeba Yes I tried but facing the "Privileged containers are not allowed" issue

safchain commented 5 years ago

@SchSeba I managed to get it working. I used the following template

apiVersion: v1
kind: Template
metadata:
  name: skydive
objects:
- apiVersion: v1
  kind: ConfigMap
  metadata:
    labels:
      app: skydive-analyzer
    name: skydive-analyzer-config
  data:
    SKYDIVE_ANALYZER_FLOW_BACKEND: elasticsearch
    SKYDIVE_ANALYZER_TOPOLOGY_BACKEND: elasticsearch
    SKYDIVE_ANALYZER_TOPOLOGY_PROBES: ""
    SKYDIVE_ETCD_LISTEN: 0.0.0.0:12379
- apiVersion: v1
  kind: ConfigMap
  metadata:
    labels:
      app: skydive-agent
    name: skydive-agent-config
- apiVersion: v1
  kind: Service
  metadata:
    labels:
      app: skydive-analyzer
    name: skydive-analyzer
  spec:
    ports:
    - name: api
      port: 8082
      protocol: TCP
      targetPort: 8082
    - name: protobuf
      port: 8082
      protocol: UDP
      targetPort: 8082
    - name: etcd
      port: 12379
      protocol: TCP
      targetPort: 12379
    - name: etcd-cluster
      port: 12380
      protocol: TCP
      targetPort: 12380
    - name: es
      port: 9200
      protocol: TCP
      targetPort: 9200
    selector:
      app: skydive
      tier: analyzer
    sessionAffinity: None
    type: NodePort
- apiVersion: v1
  kind: DeploymentConfig
  metadata:
    name: skydive-analyzer
  spec:
    replicas: 1
    selector:
      app: skydive
      tier: analyzer
    strategy:
      rollingParams:
        intervalSeconds: 1
        maxSurge: 25%
        maxUnavailable: 25%
        timeoutSeconds: 600
        updatePeriodSeconds: 1
      type: Rolling
    template:
      metadata:
        labels:
          app: skydive
          tier: analyzer
      spec:
        containers:
        - args:
          - analyzer
          - --listen=0.0.0.0:8082
          envFrom:
          - configMapRef:
              name: skydive-analyzer-config
          image: skydive/skydive
          imagePullPolicy: Always
          livenessProbe:
            failureThreshold: 3
            tcpSocket:
              port: 8082
            initialDelaySeconds: 30
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: skydive-analyzer
          ports:
          - containerPort: 8082
            protocol: TCP
          - containerPort: 8082
            protocol: UDP
          - containerPort: 12379
            protocol: TCP
          - containerPort: 12380
            protocol: TCP
          readinessProbe:
            failureThreshold: 1
            tcpSocket:
              port: 8082
            initialDelaySeconds: 30
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
        - image: elasticsearch:5
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 3
            tcpSocket:
              port: 9200
            initialDelaySeconds: 30
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: skydive-elasticsearch
          ports:
          - containerPort: 9200
            protocol: TCP
          readinessProbe:
            failureThreshold: 1
            tcpSocket:
              port: 9200
            initialDelaySeconds: 30
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          securityContext:
            privileged: true
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        terminationGracePeriodSeconds: 30
    test: false
    triggers:
    - type: ConfigChange
- apiVersion: extensions/v1beta1
  kind: DaemonSet
  metadata:
    labels:
      app: skydive
      tier: agent
    name: skydive-agent
  spec:
    selector:
      matchLabels:
        app: skydive
        tier: agent
    template:
      metadata:
        labels:
          app: skydive
          tier: agent
      spec:
        containers:
        - args:
          - agent
          env:
          - name: SKYDIVE_ANALYZERS
            value: $(SKYDIVE_ANALYZER_SERVICE_HOST):$(SKYDIVE_ANALYZER_SERVICE_PORT_API)
          - name: SKYDIVE_AGENT_TOPOLOGY_PROBES
            value: "ovsdb runc"
          envFrom:
          - configMapRef:
              name: skydive-agent-config
          image: skydive/skydive
          imagePullPolicy: Always
          name: skydive-agent
          ports:
          - containerPort: 8081
            hostPort: 8081
            protocol: TCP
          securityContext:
            privileged: true
          volumeMounts:
          - mountPath: /var/run/docker.sock
            name: docker
          - mountPath: /host/run
            name: run
          - mountPath: /run/runc
            name: runc
          - mountPath: /var/run/openvswitch/db.sock
            name: ovsdb
        dnsPolicy: ClusterFirst
        hostNetwork: true
        hostPID: true
        restartPolicy: Always
        terminationGracePeriodSeconds: 30
        volumes:
        - hostPath:
            path: /var/run/docker.sock
          name: docker
        - hostPath:
            path: /var/run/netns
          name: run
        - hostPath:
            path: /run/runc
          name: runc
        - hostPath:
            path: /var/run/openvswitch/db.sock
          name: ovsdb
- apiVersion: v1
  kind: Route
  metadata:
    labels:
      app: skydive-analyzer
    name: skydive-analyzer
  spec:
    port:
      targetPort: api
    to:
      kind: Service
      name: skydive-analyzer
      weight: 100
    wildcardPolicy: None

Basically adding runc and adding runc folder mount.

I got the following topology:

screenshot-skydive-analyzer-default router default svc cluster local-9999-2018 12 21-19-34-41

SchSeba commented 5 years ago

Hi @safchain thanks for the yaml. I will try to deploy it on my environment and update the issue.

If this works I can create a PR with the runc and docker for the k8s and openshift deployments? Or you prefer to leave them with docker as default? It can be a problem if we enable both docker and runc proves and only one of them exist?

safchain commented 5 years ago

@SchSeba yes you can create a PR for k8s and openshift deployments. I don't think there is a problem activating both probes.

SchSeba commented 5 years ago

Hi @safchain, It still didn't work for me. I got into the code a bit and saw the folder you try to read from /run/runc this folder is empty in my deployment.

this is the version of cri-o I used

cri-o.x86_64                        1.11.10-1.rhaos3.11.git42c86f0.el7
cri-tools.x86_64                    1.11.1-1.rhaos3.11.gitedabfb5.el7
criu.x86_64                         3.9-5.el7                   @rhel-7.6-base  

I can see data in this folder

ll /run/runc-ctrs/
total 0
drwx--x--x. 2 root root 60 Dec 23 22:49 011410a535564591290dc09900f112627fcdbca32b6255a9c198cd1aea29e197
drwx--x--x. 2 root root 60 Dec 23 22:37 02fc7b80b6f699820c8aa35cbae6a604e42a66ed4be6c4c7af7b5245512cfe31

and inside of any folder there

ll /run/runc-ctrs/011410a535564591290dc09900f112627fcdbca32b6255a9c198cd1aea29e197
total 24
-rw-r--r--. 1 root root 22108 Dec 23 22:49 state.json

If I change the volume to /run/runc-ctrs and restart the agents I got an error

2018-12-24T11:12:10.833Z    ERROR   runc/runc.go:171 getMetadata    cnv-executor-yadu-node2.example.com: Unable to read create config /run/containers/storage/overlay-containers/f1ce8d5743d2fec95166eaae75b8deac0409d37fe58562b29fecd4e1e882fdb5/userdata/artifacts/create-config: open /run/containers/storage/overlay-containers/f1ce8d5743d2fec95166eaae75b8deac0409d37fe58562b29fecd4e1e882fdb5/userdata/artifacts/create-config: no such file or directory

but I am able to see the runc pods now! The error is fine?

screenshot from 2018-12-24 13-17-05

please tell me if you need any other information

safchain commented 5 years ago

@SchSeba great you had it working. You can ignore this error there is already a patch under review to ignore it https://github.com/skydive-project/skydive/pull/1526/files#diff-e3e94608aea11e3eb72d370bfdc97bd1R131

SchSeba commented 5 years ago

Thanks for the comment @safchain!

I just have a question what engine you use? how it can be that you mount the /run/runc and I needed to mount the /run/runc-ctrs ?

Now I am not sure what volume to configure in the openshift deployment PR I am going to open for this repo.

safchain commented 5 years ago

We don't use any engine. We use the code that was already there in Skydive. we are just parsing the runc files according to the specs. I think the issue is due the openshift/crio/distribution. I used what you suggested to deploy it but it was on a Fedora 29.

I would suggest to mount both folder and to specify them in the config file here: https://github.com/skydive-project/skydive/blob/master/etc/skydive.yml.default#L229

SchSeba commented 5 years ago

@safchain fine and what about the openshift and k8s deployment ? there is a way to configure multiple folders for one probe? maybe add a configmap into the agent pod?

safchain commented 5 years ago

The section in the config file I pointed is a list so you can specify multiple folders for the runc probe. If the folder doesn't exist or not used it won't be a problem.

      run_path:
        - /run/runc
        - /run/runc-crts
SchSeba commented 5 years ago

Hi @safchain just a quick update this configuration doesn't work for me.

run_path:
        - /var/run/runc
        - /var/run/runc-crts

The agent skill look into /run/runc/ folder only.

safchain commented 5 years ago

I just tested

here the config I used as example

agent:
  topology:
    probes:
      - ovsdb
      - runc
    runc:
      run_path:
        - /tmp/toto
        - /tmp/titi

and I got the following lines in the logs (DEBUG level)

2018-12-27T17:06:19.601+0100    DEBUG   runc/runc.go:296 (*Probe).initialize.func1  pc12.home: Probe initialized for /tmp/titi
2018-12-27T17:06:19.601+0100    DEBUG   runc/runc.go:296 (*Probe).initialize.func1  pc12.home: Probe initialized for /tmp/toto

Can you check in the logs that the modification that you did in the config file is correctly read ?

SchSeba commented 5 years ago

Thanks for the answer @safchain How I can enable the debug level in the logs?

safchain commented 5 years ago

in the config file

https://github.com/skydive-project/skydive/blob/master/etc/skydive.yml.default#L347

logging:
  level: DEBUG
rbo commented 5 years ago

Hi, I have the same problem with my skydive installation on OpenShift: Missing CRIO informations: screenshot 2019-01-03 at 13 02 20

Versions: Skydive Agent 0.21.0-a772a0989b39 openshift v3.11.51 kubernetes v1.11.0+d4cacc0

Logs from the agent:

[root@master0 ~]# oc logs skydive-agent-5gr5h
2019-01-03T11:47:09.671Z    INFO    agent/agent.go:46 glob..func1   master0: Skydive Agent 0.21.0-a772a0989b39 starting...
2019-01-03T11:47:09.672Z    INFO    http/server.go:109 (*Server).Listen master0: Listening on socket 127.0.0.1:8081
2019-01-03T11:47:09.674Z    DEBUG   websocket/pool.go:101 (*Pool).AddClient master0: AddClient  for pool AnalyzerClientPool type : [*websocket.Pool]
2019-01-03T11:47:09.674Z    INFO    agent/probes.go:49 NewTopologyProbeBundleFromConfig master0: Topology probes: [ovsdb runc]
2019-01-03T11:47:09.675Z    INFO    probes/probes.go:67 NewFlowProbeBundle  master0: Flow probes: [pcapsocket ovssflow sflow gopacket dpdk ebpf ovsmirror]
2019-01-03T11:47:09.675Z    INFO    probes/probes.go:117 NewFlowProbeBundle master0: Not compiled with dpdk support, skipping it
2019-01-03T11:47:09.723Z    DEBUG   probes/ebpf.go:444 loadModule   master0: eBPF kernel stacktrace:

2019-01-03T11:47:09.726Z    ERROR   probes/probes.go:115 NewFlowProbeBundle master0: Failed to create ebpf probe: Unable to load eBPF elf binary (host amd64) from bindata: error while loading "socket_flow_table" (invalid argument)
2019-01-03T11:47:09.726Z    DEBUG   netns/netns.go:298 (*Probe).start   master0: Probe initialized
2019-01-03T11:47:09.729Z    INFO    probes/ovsmirror.go:427 (*OvsMirrorProbesHandler).cleanupOvsMirrors master0: OvsMirror cleanup previous mirrors
2019-01-03T11:47:09.732Z    INFO    ovs/ovsdb.go:311 (*OvsMonitor).portAdded    master0: New port "tun0(d907162f-1ca3-4def-8a45-79aa51bd2498)" added
2019-01-03T11:47:09.732Z    INFO    ovs/ovsdb.go:311 (*OvsMonitor).portAdded    master0: New port "vxlan0(1883603a-3386-4617-bd6e-2dc58f1cbefc)" added
2019-01-03T11:47:09.732Z    INFO    ovs/ovsdb.go:311 (*OvsMonitor).portAdded    master0: New port "veth91980545(19207d1b-1b86-4cf6-bfc4-44c21416c24d)" added
2019-01-03T11:47:09.732Z    INFO    ovs/ovsdb.go:311 (*OvsMonitor).portAdded    master0: New port "veth3b5136b6(0a73a714-f807-4f6d-9950-2b5d22947078)" added
2019-01-03T11:47:09.732Z    INFO    ovs/ovsdb.go:311 (*OvsMonitor).portAdded    master0: New port "vethf7099743(d021a224-1fe4-4c49-bea1-1b22296f45ed)" added
2019-01-03T11:47:09.732Z    INFO    ovs/ovsdb.go:311 (*OvsMonitor).portAdded    master0: New port "br0(863e324e-e14d-49c3-868d-39f0b1568276)" added
...
2019-01-03T11:47:09.753Z    DEBUG   runc/runc.go:318 (*Probe).initialize.func1  master0: Probe initialized for /var/run/runc
...

No more informations about runc in the logs :-(

Agent config:

apiVersion: v1
data:
  skydive.yml: |-
    agent:
      topology:
        probes:
          - ovsdb
          - runc
        runc:
          run_path:
            - /var/run/runc
    analyzer:
      listen: 0.0.0.0:8082
    logging:
      level: DEBUG
kind: ConfigMap
metadata:
  annotations:
    openshift.io/generated-by: OpenShiftNewApp
  creationTimestamp: 2019-01-03T11:01:57Z
  labels:
    app: skydive-agent
  name: skydive-agent-config
  namespace: skydive
  resourceVersion: "22679"
  selfLink: /api/v1/namespaces/skydive/configmaps/skydive-agent-config
  uid: fd112cad-0f46-11e9-aa36-fa163e0559f6

Directories inside the Agent POD

# ls -al /var/run/runc/
total 0
drwxr-xr-x. 3 root root 700 Jan  3 11:47 .
drwxr-xr-x. 1 root root  89 Jan  3 11:47 ..
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 0a1684351eaf86417b128f18c586151db29363d19bb5b3ffa066be937ed81a0e -> /var/run/containers/storage/overlay-containers/0a1684351eaf86417b128f18c586151db29363d19bb5b3ffa066be937ed81a0e/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 2c64933f5f7732803edd132e52b0b8a5af96430f1cbea2809b48b614110e5fd0 -> /var/run/containers/storage/overlay-containers/2c64933f5f7732803edd132e52b0b8a5af96430f1cbea2809b48b614110e5fd0/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:43 30edca31054ee15474ba2369f8aa991453eb3c3a7bc938d9c97f6fa0886f4bda -> /var/run/containers/storage/overlay-containers/30edca31054ee15474ba2369f8aa991453eb3c3a7bc938d9c97f6fa0886f4bda/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 11:47 3cadaf294601a044283938b26fcef5daf9e616025fbcd13ad6f8e9b21de5502b -> /var/run/containers/storage/overlay-containers/3cadaf294601a044283938b26fcef5daf9e616025fbcd13ad6f8e9b21de5502b/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 4aad111c54f4d6c4691b49220e5ba548f9dbea4298f2a9fd5f16af6577a92cdb -> /var/run/containers/storage/overlay-containers/4aad111c54f4d6c4691b49220e5ba548f9dbea4298f2a9fd5f16af6577a92cdb/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 4ea5a7187ed325a88e7cccb604f5fc0296041bb96a5968415985b7e51e7d89ca -> /var/run/containers/storage/overlay-containers/4ea5a7187ed325a88e7cccb604f5fc0296041bb96a5968415985b7e51e7d89ca/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 5c8a1abde4859801b3b5d758316a9438f8af94579ce0b756f1d4631ffc5d3bde -> /var/run/containers/storage/overlay-containers/5c8a1abde4859801b3b5d758316a9438f8af94579ce0b756f1d4631ffc5d3bde/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 5e17190cc24b8762314faaa05a3132b08d7b31833bedd354698a1fb54b7615d8 -> /var/run/containers/storage/overlay-containers/5e17190cc24b8762314faaa05a3132b08d7b31833bedd354698a1fb54b7615d8/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 5f725d453fa5db705bf634bac4c582e05e8357b84b407bc6f48d8bfdb77969c5 -> /var/run/containers/storage/overlay-containers/5f725d453fa5db705bf634bac4c582e05e8357b84b407bc6f48d8bfdb77969c5/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 60965cab194451bab5b30a725455e818906e619c5ae8a83de9b53aacfb1b10e2 -> /var/run/containers/storage/overlay-containers/60965cab194451bab5b30a725455e818906e619c5ae8a83de9b53aacfb1b10e2/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 756f68c52733150f37edaa292a2e828a9ca97da563118ada1a35f7f37a631f67 -> /var/run/containers/storage/overlay-containers/756f68c52733150f37edaa292a2e828a9ca97da563118ada1a35f7f37a631f67/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 7a2d96c32159267740598fbedbd68585fee948deb8440e093598ae0f35cb2752 -> /var/run/containers/storage/overlay-containers/7a2d96c32159267740598fbedbd68585fee948deb8440e093598ae0f35cb2752/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 8248250bcaa3cc02851aa5d0c9b755c059413833d6de58eb7b0f2d8608ec8a8f -> /var/run/containers/storage/overlay-containers/8248250bcaa3cc02851aa5d0c9b755c059413833d6de58eb7b0f2d8608ec8a8f/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 86657c5898fceaf8081f0ae2b379fcb34bfd2caf200d6744f856973bbed931bd -> /var/run/containers/storage/overlay-containers/86657c5898fceaf8081f0ae2b379fcb34bfd2caf200d6744f856973bbed931bd/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:43 887ae60fafbdd23746bb554048c92c5f33fc81b5156e647d1d797656c8062a75 -> /var/run/containers/storage/overlay-containers/887ae60fafbdd23746bb554048c92c5f33fc81b5156e647d1d797656c8062a75/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 8f3c62fd97c8a1e654083ecac3f6eaa633d2baa81ff2aa1f3c9fa86a2eb6a908 -> /var/run/containers/storage/overlay-containers/8f3c62fd97c8a1e654083ecac3f6eaa633d2baa81ff2aa1f3c9fa86a2eb6a908/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:45 9a2ed3586cdbfea4ed54cdc0b1b059dd9ad45898497a363ce90fee8dc157edb7 -> /var/run/containers/storage/overlay-containers/9a2ed3586cdbfea4ed54cdc0b1b059dd9ad45898497a363ce90fee8dc157edb7/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 a8cf7f04a85c1c52fdc8dc6f4dbe08911a3efcd1522d5a868d9eb374bb240f5d -> /var/run/containers/storage/overlay-containers/a8cf7f04a85c1c52fdc8dc6f4dbe08911a3efcd1522d5a868d9eb374bb240f5d/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 b1da0dd85ac28ab3c6d29618efbe7a2060d35f6e47fd5a443c38945fb367ca4b -> /var/run/containers/storage/overlay-containers/b1da0dd85ac28ab3c6d29618efbe7a2060d35f6e47fd5a443c38945fb367ca4b/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:43 b1e4b463bc480ab501ac71b55225f57e3164c0a15afd870dd56be4358b490de0 -> /var/run/containers/storage/overlay-containers/b1e4b463bc480ab501ac71b55225f57e3164c0a15afd870dd56be4358b490de0/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 c07a3148ee5d90cde0edfa85c27f373b8bd6f29d31a0fb5021575e5aa96a9dd7 -> /var/run/containers/storage/overlay-containers/c07a3148ee5d90cde0edfa85c27f373b8bd6f29d31a0fb5021575e5aa96a9dd7/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 c1c120cd8a6d1bcbca426e4c585653f98fcbb96cf5b9d6b93c929b56132bb404 -> /var/run/containers/storage/overlay-containers/c1c120cd8a6d1bcbca426e4c585653f98fcbb96cf5b9d6b93c929b56132bb404/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:43 c56ae36808ecaa9d9876b7bc34a1bbf7e0b1df246263313de20ea0cbd9582409 -> /var/run/containers/storage/overlay-containers/c56ae36808ecaa9d9876b7bc34a1bbf7e0b1df246263313de20ea0cbd9582409/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 c7a05e3b91678565e476f116e31becb98e40ea1293ab73af5540bd207d4b0b8e -> /var/run/containers/storage/overlay-containers/c7a05e3b91678565e476f116e31becb98e40ea1293ab73af5540bd207d4b0b8e/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:43 ca2611c057f2a3a28bab3ca3caed7b6aa66093de4252a6a65c48880f4847b0c1 -> /var/run/containers/storage/overlay-containers/ca2611c057f2a3a28bab3ca3caed7b6aa66093de4252a6a65c48880f4847b0c1/userdata
srwxr-xr-x. 1 root root   0 Jan  3 11:28 crio.sock
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 d16e72d6a6698e926c1b37589014e7322677f6d2db5c21ac419250c9475d236a -> /var/run/containers/storage/overlay-containers/d16e72d6a6698e926c1b37589014e7322677f6d2db5c21ac419250c9475d236a/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 d2da9e37ce0e150165f94acd36e5ecfadac775c8ff5e61a1159aca4718e8c3d6 -> /var/run/containers/storage/overlay-containers/d2da9e37ce0e150165f94acd36e5ecfadac775c8ff5e61a1159aca4718e8c3d6/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 d359ffb602c7997f9a1dca1a373347c88a62ffa8474c41bb0c6c301637251cd4 -> /var/run/containers/storage/overlay-containers/d359ffb602c7997f9a1dca1a373347c88a62ffa8474c41bb0c6c301637251cd4/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 11:47 ece70ddf6113111b7d0b4bd3b2507902ada78b2998ddf874d5fce539a5809c00 -> /var/run/containers/storage/overlay-containers/ece70ddf6113111b7d0b4bd3b2507902ada78b2998ddf874d5fce539a5809c00/userdata
drwxr-xr-x. 2 root root 380 Jan  3 11:46 exits
lrwxrwxrwx. 1 root root 120 Jan  3 10:43 f60d10b0d2c2f9c5719135a872133439efd241ed52ce0bb4eb8813b487cb96ed -> /var/run/containers/storage/overlay-containers/f60d10b0d2c2f9c5719135a872133439efd241ed52ce0bb4eb8813b487cb96ed/userdata
lrwxrwxrwx. 1 root root 120 Jan  3 10:44 fc2d952c006b215d8cec67f80b34ef13d3bfcb231aea379262a51cd8386de85a -> /var/run/containers/storage/overlay-containers/fc2d952c006b215d8cec67f80b34ef13d3bfcb231aea379262a51cd8386de85a/userdata
# ls -la /var/run/runc/0a1684351eaf86417b128f18c586151db29363d19bb5b3ffa066be937ed81a0e/
total 28
drwx------. 3 root root   180 Jan  3 10:44 .
drwx------. 3 root root    60 Jan  3 10:44 ..
srwx------. 1 root root     0 Jan  3 10:44 attach
-rw-r--r--. 1 root root 13546 Jan  3 10:44 config.json
prw-r--r--. 1 root root     0 Jan  3 10:44 ctl
-rw-r--r--. 1 root root     8 Jan  3 10:44 hostname
-rw-r--r--. 1 root root     5 Jan  3 10:44 pidfile
-rw-r--r--. 1 root root    61 Jan  3 10:44 resolv.conf
drwxrwxrwt. 2 root root    40 Jan  3 10:44 shm
#

DaemonSet

  apiVersion: extensions/v1beta1
  kind: DaemonSet
  metadata:
    annotations:
      openshift.io/generated-by: OpenShiftNewApp
    creationTimestamp: null
    generation: 5
    labels:
      app: skydive
      tier: agent
    name: skydive-agent
  spec:
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        app: skydive
        tier: agent
    template:
      metadata:
        creationTimestamp: null
        labels:
          app: skydive
          tier: agent
      spec:
        containers:
        - args:
          - agent
          env:
          - name: SKYDIVE_ANALYZERS
            value: $(SKYDIVE_ANALYZER_SERVICE_HOST):$(SKYDIVE_ANALYZER_SERVICE_PORT_API)
          envFrom:
          - configMapRef:
              name: skydive-agent-config
          image: skydive/skydive
          imagePullPolicy: Always
          name: skydive-agent
          ports:
          - containerPort: 8081
            hostPort: 8081
            protocol: TCP
          resources: {}
          securityContext:
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /var/run/docker.sock
            name: docker
          - mountPath: /host/run
            name: run
          - mountPath: /var/run/runc
            name: crio
          - mountPath: /var/run/openvswitch/db.sock
            name: ovsdb
          - mountPath: /etc/skydive.yml
            name: agent-config
            subPath: skydive.yml
          - mountPath: /var/run/containers
            name: containers
        dnsPolicy: ClusterFirst
        hostNetwork: true
        hostPID: true
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
        volumes:
        - hostPath:
            path: /var/run/docker.sock
            type: ""
          name: docker
        - hostPath:
            path: /var/run/containers/
            type: ""
          name: containers
        - hostPath:
            path: /var/run/crio/
            type: ""
          name: crio
        - hostPath:
            path: /var/run/netns
            type: ""
          name: run
        - hostPath:
            path: /var/run/openvswitch/db.sock
            type: ""
          name: ovsdb
        - configMap:
            defaultMode: 420
            name: skydive-agent-config
          name: agent-config
    templateGeneration: 5
    updateStrategy:
      type: OnDelete
  status:
    currentNumberScheduled: 4
    desiredNumberScheduled: 4
    numberAvailable: 4
    numberMisscheduled: 0
    numberReady: 4
    observedGeneration: 5
    updatedNumberScheduled: 4

It looks like the agent don't check the /var/run/runc/ directory in detail...

safchain commented 5 years ago

@rbo what distribution are you using ?

rbo commented 5 years ago
[root@master0 ~]# cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.6 (Maipo)"
....
[root@master0 ~]# uname -a
Linux master0 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 15 17:36:42 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@safchain if it helps, from Red Hatter to Red Hatter I can get you access to my lab.

safchain commented 5 years ago

@rbo Thanks that would be great to have the access if needed.

Can you just test something before, it looks like the /var/run/runc folder is not the folder where runc keeps the states. Can you try with the following folders /run/runc-crts or maybe /run/runc. We merged recently a commit that add them by default: https://github.com/skydive-project/skydive/commit/9b9dfdbbb551159f3122e10930afd07eb74c92a2 but not present in the 0.21.

Thanks

rbo commented 5 years ago

It works, thank you very much.

DaemonSet

[snipped]
          - mountPath: /var/run/runc
            name: crio
[snipped]
        - hostPath:
            path: /run/runc-ctrs/
            type: ""
          name: crio
[snipped]

Logs

[snipped]
2019-01-03T13:43:10.180Z    DEBUG   runc/runc.go:232 (*Probe).registerContainer master0: Register runc container 0268dc4e5ef7f324ad81d8956f3c78b03524bfefd1f9ee2ba3322b77e4cc55be and PID 114893
2019-01-03T13:43:10.181Z    DEBUG   runc/runc.go:232 (*Probe).registerContainer master0: Register runc container 0a1684351eaf86417b128f18c586151db29363d19bb5b3ffa066be937ed81a0e and PID 14703
[snipped]

screenshot 2019-01-03 at 14 48 30

rbo commented 5 years ago

I created an pull request with some changes to improve the installation on OpenShift : #1564

safchain commented 5 years ago

@rbo @SchSeba do you think that this issue can be closed as it was to introduce the support of cri-o ? if we discover issue or we want to improve it we will open new issue ?

rbo commented 5 years ago

From my point of view: yes - OpenShift works very well with my pull request. I don't know plain k8s.

SchSeba commented 5 years ago

Yes I think we can close this issue now thanks for the help!

safchain commented 5 years ago

Thanks @SchSeba @rbo for helping us to add Cri-o/OpenShift support !

rbo commented 5 years ago

Welcome, ping me if you need further help!