microsoft / ApplicationInsights-Kubernetes

Enrich the telemetry data for .NET applications running inside containers that are managed by Kubernetes.
Other
135 stars 57 forks source link

Container id mismatch for ContainerDEngine by MiniKube (cGroupV2) - needs more investigation #290

Closed xiaomi7732 closed 1 year ago

xiaomi7732 commented 1 year ago

It looks like the implementation has already changed that the container id we got from the mountinfo doesn't match the container id in a K8s cluster.

Here's an example:

Info by inspecting the container id:

ai-k8s-f5webapi:
    Container ID:   docker://4b6b7e83bfd41e40da8a97a117c21c03e0aa628922c39639eaa290e80afa9e8e
    Image:          account/f5webapi:latest
    Image ID:       docker-pullable://account/f5webapi@sha256:bc66851e2d800917a333c0fcb638e64368f9e431194ea334e0baeb16cf3adf03

Info by fetching the mount info:

5204 5199 0:54 /volumes/minikube/_data/lib/kubelet/pods/23ce49e7-e288-4eaf-97e6-6764e029ed31/containers/ai-k8s-f5webapi/4c08abab /dev/termination-log rw,relatime - zfs rpool/var/lib/docker rw,xattr,posixacl
5205 5196 0:54 /volumes/minikube/_data/lib/docker/containers/c4c55130c559a9718f8b86c223cf8684aaf72a8324f1d3606622769e809c7a76/resolv.conf /etc/resolv.conf rw,relatime - zfs rpool/var/lib/docker rw,xattr,posixacl
5206 5196 0:54 /volumes/minikube/_data/lib/docker/containers/c4c55130c559a9718f8b86c223cf8684aaf72a8324f1d3606622769e809c7a76/hostname /etc/hostname rw,relatime - zfs rpool/var/lib/docker rw,xattr,posixacl
5207 5196 0:54 /volumes/minikube/_data/lib/kubelet/pods/23ce49e7-e288-4eaf-97e6-6764e029ed31/etc-hosts /etc/hosts rw,relatime - zfs rpool/var/lib/docker rw,xattr,posixacl
SantoshG-Derivco commented 1 year ago

I'm experiencing a similar issue

Info from log

[Debug] [2022-08-11T07:16:41.5652706Z] Kubernetes base address: https://172.20.0.1:443/
[Debug] [2022-08-11T07:16:41.5665005Z] Getting container id by environment variable ContainerId. Result: .
[Trace] [2022-08-11T07:16:41.5693523Z] [CGroupV1Matcher] No match for containerId. Input: 11:blkio:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3e221707_600c_4eb1_8fe7_a9a7163a1278.slice/cri-containerd-5146b2bcd77ab4f2624bc1fbd98cf9751741344a80b043dbd77a4e847bff4f06.scope, pattern: cpu.+/([^/]*)$
[Trace] [2022-08-11T07:16:41.5699334Z] [CGroupV1Matcher] No match for containerId. Input: 10:perf_event:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3e221707_600c_4eb1_8fe7_a9a7163a1278.slice/cri-containerd-5146b2bcd77ab4f2624bc1fbd98cf9751741344a80b043dbd77a4e847bff4f06.scope, pattern: cpu.+/([^/]*)$
[Trace] [2022-08-11T07:16:41.5699542Z] [CGroupV1Matcher] No match for containerId. Input: 9:hugetlb:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3e221707_600c_4eb1_8fe7_a9a7163a1278.slice/cri-containerd-5146b2bcd77ab4f2624bc1fbd98cf9751741344a80b043dbd77a4e847bff4f06.scope, pattern: cpu.+/([^/]*)$
[Trace] [2022-08-11T07:16:41.5699597Z] [CGroupV1Matcher] No match for containerId. Input: 8:freezer:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3e221707_600c_4eb1_8fe7_a9a7163a1278.slice/cri-containerd-5146b2bcd77ab4f2624bc1fbd98cf9751741344a80b043dbd77a4e847bff4f06.scope, pattern: cpu.+/([^/]*)$
[Trace] [2022-08-11T07:16:41.5699716Z] [CGroupV1Matcher] No match for containerId. Input: 7:devices:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3e221707_600c_4eb1_8fe7_a9a7163a1278.slice/cri-containerd-5146b2bcd77ab4f2624bc1fbd98cf9751741344a80b043dbd77a4e847bff4f06.scope, pattern: cpu.+/([^/]*)$
[Trace] [2022-08-11T07:16:41.5700922Z] [CGroupV1Matcher] Matched container id.
[Debug] [2022-08-11T07:16:41.5701407Z] [CGroupContainerIdProvider] Got container id by: 6:cpuset:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3e221707_600c_4eb1_8fe7_a9a7163a1278.slice/cri-containerd-5146b2bcd77ab4f2624bc1fbd98cf9751741344a80b043dbd77a4e847bff4f06.scope
[Information] [2022-08-11T07:16:41.5701549Z] [CGroupContainerIdProvider] Got container id: cri-containerd-5146b2bcd77ab4f2624bc1fbd98cf9751741344a80b043dbd77a4e847bff4f06.scope
[Debug] [2022-08-11T07:16:41.6319953Z] Initialize Application Insights for Kubernetes telemetry initializer with Options:
{"InitializationTimeout":"00:02:00","DisablePerformanceCounters":false}

info from k8 K8sPod.Status.ContainerStatuses

[
      {
        "name": "****",
        "state": {
          "running": {
            "startedAt": "2022-08-11T09:21:10Z"
          }
        },
        "lastState": {
        },
        "ready": true,
        "restartCount": 0,
        "image": "****",
        "imageID": "****",
        "containerID": "containerd://5146b2bcd77ab4f2624bc1fbd98cf9751741344a80b043dbd77a4e847bff4f06",
        "started": true
      }
    ]

Detected ContainerId from CGroupContainerIdProvider cri-containerd-5146b2bcd77ab4f2624bc1fbd98cf9751741344a80b043dbd77a4e847bff4f06.scope vs value from k8 API containerd://5146b2bcd77ab4f2624bc1fbd98cf9751741344a80b043dbd77a4e847bff4f06 mismatch

Version: Microsoft.ApplicationInsights.Kubernetes Version="2.0.4"

xiaomi7732 commented 1 year ago

@SantoshG-Derivco, thanks for the input. BTW, are you using CRI-O engine? And is it a blocking issue for you? In other words, are there other info on your telemetries? Pod name for example.

SantoshG-Derivco commented 1 year ago

@xiaomi7732 I'm using a containerd runtime engine.

It is blocking as the only alternative I have is to find some way to set ContainerId env var to the correct ID so that EnvironmentVariableContainerIdProvider would use that instead of the value from CGroup.

I do get pod name by environment variable HOSTNAME which is correct.

NB. I'm using a Linux image.

xiaomi7732 commented 1 year ago

@SantoshG-Derivco, great! According to this line of code, there is still a chance for the proper container id to be used, consider the yaml you pasted with only 1 container status. Did that happen?

Expecting to see a log like this:

Use the only container inside the pod for container id: correct-container-id.
SantoshG-Derivco commented 1 year ago

@xiaomi7732 unfortunately while the k8 response had 1 container in the status, I did not get that log message.

I believe this is due to the fact that the CGroupContainerIdProvider did process a container id how ever the container id that it processed was prefixed with cri-containerd- and suffixed with .scope which was not discarded by the regex, the value in between the prefix and suffix was indeed the correct container id. Due to the CGroupContainerIdProvider having set ContainerId the if (string.IsNullOrEmpty(containerId)) check which wraps the code you reference was not executed.

Just my suggestion on some possible solutions to this

  1. Update the regex MatchPattern = @"cpu.+/([^/]*)$" in CGroupV1Matcher to either discard the prefix and suffix.
  2. Update the regex CGroupV1Matcher to discard this value so that a subsequent ContainerIdProvider can be executed.
  3. Update logic on this line so that k8 responses that contain only a single container status uses the containerId from the response and discards what was determined by the container providers, maybe log a warning if the container id value from the provider that was discarded and did not match the single container status.

NB. These are only my suggestions and I am no expert in this area. Appreciate your assistance @xiaomi7732

xiaomi7732 commented 1 year ago

Oh~ I am sorry, I misread the important information - the container id, in your case, is actually there, mixed in with prefix/suffix. That makes the case easier to deal with. I'll see if I can issue a quick fix to it. Thank you for pointing it out.

blushingpenguin commented 1 year ago

Not having a container id causes an exception to be thrown:

Unhandled exception. System.InvalidOperationException: Failed fetching container id.
   at Microsoft.ApplicationInsights.Kubernetes.KubeHttpClientSettingsBase.GetContainerIdOrThrow()
   at Microsoft.ApplicationInsights.Kubernetes.KubeHttpClientSettingsBase..ctor(String kubernetesServiceHost, String kubernetesServicePort, 
IEnumerable`1 containerIdProviders)
   at Microsoft.ApplicationInsights.Kubernetes.KubeHttpClientSettingsProvider..ctor(IEnumerable`1 containerIdProviders, String pathToToken, 
String pathToCert, String pathToNamespace, String kubernetesServiceHost, String kubernetesServicePort)
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Span`1& arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)

This didn't used to be an error -- the code that handles the case of a missing container id (e.g. where there is only one container) doesn't get a chance to run now. None of the detection methods for the container id seems to be applicable in my case,

root@pod:/# cat /proc/self/cgroup
0::/
cat /proc/self/mountinfo |grep kubepods # nothing

This is running the microk8s 1.24.3 snap on ubuntu 22.04

xiaomi7732 commented 1 year ago

Hi @blushingpenguin, thanks for the feedback with the callstack. I'll check what is going on.

Which version are you using BTW?

Also, may I ask for a favor? Do you mind running cat /proc/self/mounthinfo and sending back the full result?

Thank you!

Edit: It is a bug. I created a new issue for tracking. See #297.

blushingpenguin commented 1 year ago

that was with 2.04 -- 2.03 works

5484 5485 0:534 / / rw,relatime master:1707 - overlay overlay rw,lowerdir=/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/36234/fs:/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/29678/fs:/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/29677/fs:/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/29676/fs:/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/29675/fs:/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/29674/fs:/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/29673/fs:/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/29672/fs:/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/29671/fs:/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/29670/fs:/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/29669/fs:/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/29668/fs,upperdir=/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/36287/fs,workdir=/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/36287/work
7954 5484 0:536 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
7955 5484 0:537 / /dev rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755,inode64
7956 7955 0:538 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=666
7957 7955 0:347 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw
7958 5484 0:425 / /sys ro,nosuid,nodev,noexec,relatime - sysfs sysfs ro
7965 7958 0:28 / /sys/fs/cgroup ro,nosuid,nodev,noexec,relatime - cgroup2 cgroup rw
7966 5484 0:334 / /cacerts ro,relatime - tmpfs tmpfs rw,size=65638616k,inode64
7967 5484 0:336 / /certs ro,relatime - tmpfs tmpfs rw,size=65638616k,inode64
7968 5484 0:337 / /rabbitmq-client-cert ro,relatime - tmpfs tmpfs rw,size=65638616k,inode64
7969 5484 253:1 /var/snap/microk8s/common/var/lib/kubelet/pods/8dd3c024-509a-4a1d-8e5f-f2a97c7adce9/volumes/kubernetes.io~empty-dir/locale /app/locale ro,relatime - ext4 /dev/mapper/vg0-lv--root rw
7970 5484 0:335 / /app/secrets ro,relatime - tmpfs tmpfs rw,size=65638616k,inode64
7971 5484 253:1 /var/snap/microk8s/common/var/lib/kubelet/pods/8dd3c024-509a-4a1d-8e5f-f2a97c7adce9/etc-hosts /etc/hosts rw,relatime - ext4 /dev/mapper/vg0-lv--root rw
7972 7955 253:1 /var/snap/microk8s/common/var/lib/kubelet/pods/8dd3c024-509a-4a1d-8e5f-f2a97c7adce9/containers/apps-api/d85ef747 /dev/termination-log rw,relatime - ext4 /dev/mapper/vg0-lv--root rw
7973 5484 253:1 /var/snap/microk8s/common/var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/76c7df1199c6ddd57816c8f13d88497811f5dd17cbe3041e17df73521b43bd28/hostname /etc/hostname rw,relatime - ext4 /dev/mapper/vg0-lv--root rw
7974 5484 253:1 /var/snap/microk8s/common/var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/76c7df1199c6ddd57816c8f13d88497811f5dd17cbe3041e17df73521b43bd28/resolv.conf /etc/resolv.conf rw,relatime - ext4 /dev/mapper/vg0-lv--root rw
7975 7955 0:344 / /dev/shm rw,nosuid,nodev,noexec,relatime - tmpfs shm rw,size=65536k,inode64
7976 5484 0:338 / /run/secrets/kubernetes.io/serviceaccount ro,relatime - tmpfs tmpfs rw,size=65638616k,inode64
5486 7954 0:536 /bus /proc/bus ro,nosuid,nodev,noexec,relatime - proc proc rw
5487 7954 0:536 /fs /proc/fs ro,nosuid,nodev,noexec,relatime - proc proc rw
5488 7954 0:536 /irq /proc/irq ro,nosuid,nodev,noexec,relatime - proc proc rw
5489 7954 0:536 /sys /proc/sys ro,nosuid,nodev,noexec,relatime - proc proc rw
5490 7954 0:536 /sysrq-trigger /proc/sysrq-trigger ro,nosuid,nodev,noexec,relatime - proc proc rw
5491 7954 0:539 / /proc/acpi ro,relatime - tmpfs tmpfs ro,inode64
5492 7954 0:537 /null /proc/kcore rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755,inode64
5493 7954 0:537 /null /proc/keys rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755,inode64
5494 7954 0:537 /null /proc/timer_list rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755,inode64
5495 7954 0:540 / /proc/scsi ro,relatime - tmpfs tmpfs ro,inode64
5496 7958 0:541 / /sys/firmware ro,relatime - tmpfs tmpfs ro,inode64

also from kubectl get pod -o yaml:

 containerStatuses:
  - containerID: containerd://85155e139d9ad0e221604b24e849897e378f4eb00bbd539d6c5e3905741f33f2

I had a poke around in /proc and /var but I couldn't find that container id anywhere

xiaomi7732 commented 1 year ago

With the newer versions, unfortunately, it is unreliable to fetch container id automatically.

We have been looking into other directions to get the same info, and this will be the alternative:

You will have to use the latest package to make it work, though: https://www.nuget.org/packages/Microsoft.ApplicationInsights.Kubernetes/6.1.1-beta3

xiaomi7732 commented 1 year ago

Please try providing the container name and see if that reaches the goal. I'll close this one. Feel free to open a new issue if there are any other suggestions. Thank you for the reports!