microsoft / ApplicationInsights-Kubernetes

Enrich the telemetry data for .NET applications running inside containers that are managed by Kubernetes.
Other
138 stars 54 forks source link

No container id found by container id providers after AKS upgrade 1.23->1.25 #344

Closed TTmaister closed 1 year ago

TTmaister commented 1 year ago

After upgrade AKS 1.23.x to 1.25.x Application Insight Kubernetes addon can't get container Id if multiple container are used.

RBAC roles and binding has been same before upgrade.

ContainerIdProviders not

[Warning] [2023-03-23T14:50:26.2712637Z] [DockerEngineMountInfoContainerIdProvider] Can't figure out container id. [Warning] [2023-03-23T14:50:26.2712637Z] [DockerEngineMountInfoContainerIdProvider] Can't figure out container id. [Information] [2023-03-23T14:50:26.2712968Z] No container id found by container id providers. [Information] [2023-03-23T14:50:26.2712968Z] No container id found by container id providers.

cat /proc/self/mountinfo

5504 4910 0:571 / / rw,relatime master:1369 - overlay overlay rw,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/870/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/727/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/722/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/721/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/720/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/719/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/642/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/871/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/871/work 5505 5504 0:573 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 5506 5504 0:574 / /dev rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755,inode64 5507 5506 0:575 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=666 5508 5506 0:431 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw 5509 5504 0:436 / /sys ro,nosuid,nodev,noexec,relatime - sysfs sysfs ro 5510 5509 0:28 / /sys/fs/cgroup ro,nosuid,nodev,noexec,relatime - cgroup2 cgroup rw 5511 5506 0:428 / /dev/shm rw,nosuid,nodev,noexec,relatime - tmpfs shm rw,size=65536k,inode64 5512 5504 8:1 /var/lib/kubelet/pods/39f4203d-ace9-4f64-954f-c30dfb6f473a/etc-hosts /etc/hosts rw,relatime - ext4 /dev/root rw,discard,errors=remount-ro 5513 5506 8:1 /var/lib/kubelet/pods/39f4203d-ace9-4f64-954f-c30dfb6f473a/containers/managementgateway/acde57d1 /dev/termination-log rw,relatime - ext4 /dev/root rw,discard,errors=remount-ro 5514 5504 8:1 /var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/6f3e199feb9dbf9b7d1b05199f9977876d4159b05761e6533324c2abcb1a67cf/hostname /etc/hostname rw,relatime - ext4 /dev/root rw,discard,errors=remount-ro 5515 5504 8:1 /var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/6f3e199feb9dbf9b7d1b05199f9977876d4159b05761e6533324c2abcb1a67cf/resolv.conf /etc/resolv.conf rw,relatime - ext4 /dev/root rw,discard,errors=remount-ro 5516 5504 0:395 / /run/secrets/azure/tokens ro,relatime - tmpfs tmpfs rw,size=5486876k,inode64 5517 5504 0:427 / /run/secrets/kubernetes.io/serviceaccount ro,relatime - tmpfs tmpfs rw,size=5486876k,inode64 4911 5505 0:573 /bus /proc/bus ro,nosuid,nodev,noexec,relatime - proc proc rw 4912 5505 0:573 /fs /proc/fs ro,nosuid,nodev,noexec,relatime - proc proc rw 4913 5505 0:573 /irq /proc/irq ro,nosuid,nodev,noexec,relatime - proc proc rw 4914 5505 0:573 /sys /proc/sys ro,nosuid,nodev,noexec,relatime - proc proc rw 4915 5505 0:573 /sysrq-trigger /proc/sysrq-trigger ro,nosuid,nodev,noexec,relatime - proc proc rw 4916 5505 0:576 / /proc/acpi ro,relatime - tmpfs tmpfs ro,inode64 4917 5505 0:574 /null /proc/kcore rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755,inode64 4918 5505 0:574 /null /proc/keys rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755,inode64 4919 5505 0:574 /null /proc/timer_list rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755,inode64 4920 5505 0:577 / /proc/scsi ro,relatime - tmpfs tmpfs ro,inode64 4921 5509 0:578 / /sys/firmware ro,relatime - tmpfs tmpfs ro,inode64

xiaomi7732 commented 1 year ago

Hi @TTmaister, this is a known issue because of the implementations of docker engines. There has been a related issue #290.

Please check out the design doc here for the container id providers: https://github.com/microsoft/ApplicationInsights-Kubernetes/wiki/Design-for-ContainerIdProviders

That said, 2 questions for you:

  1. Is the container id anywhere in the mountinfo you shared? If it is, we might be able to append another container id providers for your case.

  2. Is it possible for you to set up the environment variable for container id as mentioned in the design doc to workaround the issue? For example: ContainerId=my-container-id.

Any proposal for how to make this more adaptive is super-welcome!

BTW, just FYI, here's a lengthy stackoverflow discussion on the topic: https://stackoverflow.com/questions/20995351/how-can-i-get-docker-linux-container-information-from-within-the-container-itsel/72565733#72565733

And an open standard issue:

https://github.com/opencontainers/runtime-spec/issues/1105

TTmaister commented 1 year ago
  1. Container ID was 34bc656c1739fe713a867e18cc307f96e6fcba3cf89755db31b3c837f9d137f5. The answer is that there is no container id visible in mountinfo.

  2. Possibly I can set the container id in the environment variables, but I don't know how to do it in the Kubernetes Deployment yaml so that each pod has its own ID. Does it have to be the same as what the Kubernetes Api provides.

xiaomi7732 commented 1 year ago

Hey @TTmaister, thanks for providing the information. Unfortunately, without container id anywhere in the container, we won't have the magic to somehow make it available.

Manually set the environment would probably be the only way that will provide full enhancement. I don't know what will happen if you set container id to a mismatched value. That is an interesting idea. I assume you could set the environment variable to a random value for identification of the instances, but will it provide too much value? (pods comes and goes, and if the container id is random, why bother?)

That said, could you please help me understand what the telemetry looks like on your end? I assume container id missing but everything else, like podName, node info should be there on the events, does that align with what you see?

TTmaister commented 1 year ago

Application Insight is missing all the data brought by the plugin. I can see in Application Insight that the POD requests information about the POD from the Kubernetes API and gets the response http200.

TTmaister commented 1 year ago

Hey @xiaomi7732

Does it make any sense to change the ContainerIdHolder class to be able to parse the ContainerName variable from the environment variables.

And then filtter ContainerStatus by ContainerName

 if (containerStatuses is not null && containerStatuses.Count > 1)
        {
            string? containerName = Environment.GetEnvironmentVariable("ContainerName");
            containerStatus = containerStatuses.FirstOrDefault(c => c.Name == containerName);
            _logger.LogInformation(FormattableString.Invariant($"Use the only container inside the pod for container id: {containerStatus.ContainerID}"));

            using (IServiceScope scope = _serviceScopeFactory.CreateScope())
            {
                IContainerIdNormalizer normalizer = scope.ServiceProvider.GetRequiredService<IContainerIdNormalizer>();
                if (normalizer.TryNormalize(containerStatus.ContainerID, out string? normalizedContainerId))
                {
                    _containerId = normalizedContainerId;
                    return true;
                }
            }

            _logger.LogError(FormattableString.Invariant($"Normalization failed for container id: {containerStatus.ContainerID}"));
        }

ContinerName can be defined as environment variable Kubernetes Deployment yaml.

env:
  - name: ContainerName
     valueFrom:
       fieldRef:
         fieldPath: metadata.labels['app']

Or some other label. We have set "app" label for all deployments.

xiaomi7732 commented 1 year ago

Hi @TTmaister, using ContainerName for matching seem like an approach. I think we should peruse it when time permits. Do you know is pod name always unique?

With regarding what is going on, I think I see it. Container id was supposed to be optional, but it is required - in between several iterations to support .NET 6 and the latest K8s SDK. See below for the details:

It was like this, there had been a note to allow empty container id.

// Notes: It is still possible for the optional container id to be empty at this point, the following method needs to handle the case.
if (!await SpinWaitContainerReadyAsync(timeoutAt, queryClient, myPod, containerId, cancellationToken).ConfigureAwait(false))
{
    _logger.LogError(Invariant($"Kubernetes info is not available before the timeout at {timeoutAt}."));
    return null;
}

And in the implementation, this is the logic to allow empty container id:

if (!string.IsNullOrEmpty(myContainerId))
{
    // Check targeted container status
    readyToGo = IsContainerReady(podInfo.GetContainerStatus(myContainerId));
}
else
{
    _logger.LogWarning("No container id available. Fallback to use the any container for status checking.");
    readyToGo = podInfo.GetAllContainerStatus().Any(s => IsContainerReady(s));
}

And it is like this now, meaning optional container id is not allowed:

public async Task<bool> IsContainerReadyAsync(CancellationToken cancellationToken)
{
    V1ContainerStatus? myContainerStatus = await GetMyContainerStatusAsync(cancellationToken).ConfigureAwait(false);
    if (myContainerStatus is not null)
    {
        return IsContainerStatusReady(myContainerStatus);
    }

    return false;
}

That doesn't agree with the design that I shared earlier and shall be treated as a bug. Will it unblock you if I make it optional again?

TTmaister commented 1 year ago

Pod can be recreate with the same name, but there can only be one with the same name at a time. Object Names and IDs

If I tested it right... if Container Id missing then Application Insight not showing "Cloud Role Name". But is show and populate this custom properties

Kubernetes.Node.Name Kubernetes.Pod.Name Kubernetes.Pod.Namespace Kubernetes.Node.ID Kubernetes.Deployment.Name Kubernetes.Pod.ID Kubernetes.ReplicaSet.Name Kubernetes.Pod.Labels

xiaomi7732 commented 1 year ago

That is correct. FYI, the line of code: https://github.com/microsoft/ApplicationInsights-Kubernetes/blob/a394b161905d606fb49ef2a16a0c5e3d7d86d2c6/src/ApplicationInsights.Kubernetes/TelemetryInitializers/KubernetesTelemetryInitializer.cs#L63.

xiaomi7732 commented 1 year ago

I think there could be 2 fixes:

  1. Make container id optional (PR #345 )
  2. Allows user the specify Container Name. (#346)

And I inspected various options to back fill container id, and I think using container name is a very good idea. It might worth a bit documentation but could be useful as an alternative to container ids.

@TTmaister are you interested to submit a PR for it? Or I can prepare a PR if you don't have the time.

BTW, if you are interested, please fork this repository, and branching off your own fork since this repository has been locked down and nobody could PR into it directly. :-)

xiaomi7732 commented 1 year ago

@TTmaister your change has been released:

https://www.nuget.org/packages/Microsoft.ApplicationInsights.Kubernetes/6.1.1-beta2

Let me know if that unblocks you. Thank you for your contribution!

TTmaister commented 1 year ago

@TTmaister your change has been released:

https://www.nuget.org/packages/Microsoft.ApplicationInsights.Kubernetes/6.1.1-beta2

Let me know if that unblocks you. Thank you for your contribution!

Works as expected.

  1. One container no need for environment variable to get "Cloud Role Name" visible.
  2. More than one container and ContainerName environment variable is not present "Cloud Role Name" not visible.
  3. Multiple containers and ContainerName environment variable is preset "Cloud Role Name" visible.

@xiaomi7732 Maybe this should be documented to project readme.

xiaomi7732 commented 1 year ago

@TTmaister, thanks for the verification! I will update the wiki. I'll keep this issue open until the stable version of 6.1.1 got released.

xiaomi7732 commented 1 year ago

A FAQ is added: https://github.com/microsoft/ApplicationInsights-Kubernetes/wiki/FAQ, pointing to the wiki: How to set container name manually.

Ismael-Pep commented 1 year ago

Is there an Eta on when the version 6.1.1 will be released?, Thx!

xiaomi7732 commented 1 year ago

Hey @Ismael-Pep, thanks for the inquiry. Here's the new package is released today: https://www.nuget.org/packages/Microsoft.ApplicationInsights.Kubernetes/6.1.1