newrelic / helm-charts

Helm charts for New Relic applications
Apache License 2.0
97 stars 205 forks source link

[newrelic-pixie] newrelic-pixie init container not running on arm64 #1152

Closed maxlemieux closed 3 months ago

maxlemieux commented 11 months ago

Bug description

newrelic-pixie chart fails to install to arm64 nodes.

Version of Helm and Kubernetes

Any versions, where the nodes are arm64 type. Tested on AKS, Kubernetes v1.26.6 with node pool template Standard_D2pds_v5 (arm64)

Which chart?

helm search repo newrelic-pixie
NAME                    CHART VERSION   APP VERSION DESCRIPTION                                      
newrelic/newrelic-pixie 2.1.2           2.1.4       A Helm chart for the New Relic Pixie integration.

What happened?

The newrelic-pixie job fails 5 times in quick succession after scheduling to an arm64 node.

Logs for the cluster-registration-wait container include this message:

exec /bin/sh: exec format error                                                                                                                                                     │

What you expected to happen?

Expecting the init container to work with arm64.

How to reproduce it?

Add an arm64 node pool to your cluster. Taint the other node groups. Process per this guide.

Install the New Relic bundle with Pixie enabled.

Anything else we need to know?

This is the container image for the container that's not running on arm64:

Image:         gcr.io/pixie-oss/pixie-dev-public/curl:1.0                                                                                                                       │
Image ID:      gcr.io/pixie-oss/pixie-dev-public/curl@sha256:b57f1d617b3eded350e2f78a5eece0c0839c59f59f1dece39f413f599dc382b1                                                   │
workato-integration[bot] commented 11 months ago

https://issues.newrelic.com/browse/NR-167770

workato-integration[bot] commented 11 months ago

https://new-relic.atlassian.net/browse/NR-167770

ddelnano commented 11 months ago

The pixie repo seems to use this "multiarch" tagged image.

$ git grep 'pixie-dev-public\/curl' | grep '^k8s'
k8s/cloud/base/ory_auth/kratos/kratos_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/devinfra/buildbuddy-executor/values.yaml:  image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/base/kelvin_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/base/patch_sentry.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/base/query_broker_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/bootstrap/cloud_connector_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/etcd_metadata/base/metadata_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/etcd_metadata/base/metadata_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/pem/base/pem_daemonset.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/persistent_metadata/base/metadata_statefulset.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/sanitizer/kelvin_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd

My suspicion is that the helm chart may not have the latest changes to pull in the correct image.

ddelnano commented 11 months ago

I missed that this wasn't the pixie-operator helm chart, but the newrelic-pixie chart. I believe we need to replace this image with the one I mentioned above.

ddelnano commented 11 months ago

After investigating this more, the curl image isn't the only one to address. The newrelic/newrelic-pixie-integration repo isn't publishing container images for ARM. I've validated with @maxlemieux's help that if those two things are addressed, that the chart successfully installs.

ddelnano commented 9 months ago

Thenewrelic/newrelic-pixie-integration repo's v2.2.0 release supports ARM builds now. We can now update the helm-chart to use this version and fix the curl issue mentioned above.

maxlemieux commented 9 months ago

The curl container issue seems to be fixed with this update, but the main container (not the init container) now shows the same issue with exec format.

ddelnano commented 9 months ago

This will be addressed once #1198 is merged and a new nri-bundle release is made. Thanks for all your help through this @maxlemieux!

workato-integration[bot] commented 3 months ago

All attempts at reproducing this issue failed, or not enough information was available to reproduce the issue. Reading the code produces no clues as to why this behavior would occur. If more information appears later, please reopen the issue.