stackrox / stackrox

The StackRox Kubernetes Security Platform performs a risk analysis of the container environment, delivers visibility and runtime alerts, and provides recommendations to proactively improve security by hardening the environment.
Apache License 2.0
1.12k stars 144 forks source link

Installing stackrox-central-services on OpenShift via ArgoCD: "imagePullSecrets" in ServiceAccount cause "out of sync" #12124

Open kastl-ars opened 3 months ago

kastl-ars commented 3 months ago

Hi,

I am trying to deploy StackRox (not ACS) onto a OpenShift cluster via ArgoCD.

One issue that I am currently trying to solve is the constant "out of sync" due to the ServiceAccounts being created with an empty imagePullSecrets: section (instead of omitting this section completely). My guess would be that the if-condition in the helm templates needs to be before this line, but this far I have not found the templates for the charts so this is just a guess).

OpenShift automatically adds an imagePullSecrets: section with the secrets it creates, as far as I understood because we have a ClusterPullSecret set.

I have not seen a difference when using the different settings for imagePullSecrets in the chart's values.yaml (useFromDefaultServiceAccount, allowNone, etc.).

Anyone else having this issue?

Kind Regards, Johannes

P.S.: For the record, here is our current values.yaml (which contains lots of things I tried out, as the admin password does not get respected, but I'll open another issue for that:

  allowNonstandardReleaseName: true
  env:
    openshift: 4
    istio: false
    platform: "default"
    offlineMode: true
  system:
    enablePodSecurityPolicies: false
  meta:
    useLookup: false
  central:
    telemetry:
      enabled: false
    persistence:
      none: true
    adminPassword:
      htpasswd: <redacted-htpasswd-string-here>
    exposure:
      route:
        enabled: true
    resources:
      requests:
        memory: 1Gi
        cpu: 1
      limits:
        memory: 4Gi
        cpu: 1
    db:
      resources:
        requests:
          memory: 1Gi
          cpu: 500m
        limits:
          memory: 4Gi
          cpu: 1
  scanner:
    autoscaling:
      disable: true
    replicas: 1
    resources:
      requests:
        memory: 500Mi
        cpu: 500m
      limits:
        memory: 2500Mi
        cpu: 2000m
mtodor commented 3 months ago

Hi @kastl-ars! Thank you for reaching out!

I'll try to enumerate your questions, so that it's easier to keep discussion and refer to different problems.

  1. Empty imagePullSecrets: is populated by Opensfhit and treated as out-of-sync by ArgoCD

We are using imagePullSecrets.allowNone=true. If, for some reason, imagePullSecrets has a different state and ArgoCD is trying to re-sync, maybe you can try useExisting: [].

  1. Need help finding helm templates

You can always use helm pull to get helm chart .tgz. This is a template that helm uses. Also, helm charts should be published here: https://github.com/stackrox/helm-charts

Unfortunately, we have limited ArgoCD experience. I'm not sure if you already did that, but my suggestion would be to work with plain helm first and get to the state where everything is deployed and works. The reason for that is to simply reduce the number of components that could cause a problem.

kastl-ars commented 3 months ago

I think I found the source templates being used for the creation of the serviceAccounts.

https://github.com/stackrox/stackrox/blob/master/image/templates/helm/stackrox-central/templates/01-central-00-db-serviceaccount.yaml#L12

Using this as an example, I think that the problem is that the imagePullSecrets: line is always being added to the manifest. Even if the range of ._rox.imagePullSecrets._names is empty.

Hypothesis: This means that this key is supposed to be empty, while OpenShift actually populated this key.

Putting an if-condition around the whole block, to only write it if there are actual secrets to add, could solve the problem.

Something along these lines:

{{- if  ._rox.imagePullSecrets._names }}
imagePullSecrets:
{{- range $secretName := ._rox.imagePullSecrets._names }}
- name: {{ quote $secretName }}
{{- end }}
{{- end }}
mclasmeier commented 3 months ago

Hi @kastl-ars, what Helm chart version? This is important for us when trying to reproduce a problem. (again 400.4.4?) Thanks Moritz

kastl-ars commented 3 months ago

@mclasmeier Sorry, yes, currently with 400.4.4

mclasmeier commented 3 months ago

Could you check if the problem is also present in the latest Helm chart?

kastl-ars commented 3 months ago

Could you check if the problem is also present in the latest Helm chart?

Sure, currently trying with 400.5.0

kastl-ars commented 3 months ago

The "out of sync" still shows up, so I still think my hypothesis is correct.

Unless this is caused by other sync errors due to the hook annotations on the PVC, see https://github.com/stackrox/stackrox/issues/2482#issuecomment-2249727577

I would propose to keep this on hold until the PVC issue is solved.

kastl-ars commented 3 months ago

OK, after I solved the PVC issue the serviceAccounts are still shown as "out of sync" by ArgoCD.

ArgoCD wants to remove those two lines:

imagePullSecrets:
  - name: central-dockercfg-m9fqf

Again, my guess is that this is caused by the empty (but present) imagePullSecrets key.

Tested with 400.5.0

mtodor commented 3 months ago

I have one concern here. If we have helm chart output without rendered: imagePullSecrets, would Openshift still add it, and would ArgoCD still consider that as "out-of-sync"?

@kastl-ars Would it be possible for you to change the config and use imagePullSecrets.useExisting (it is a list of names, I think) - and set it to use central-dockercfg-m9fqf? I know that central-dockercfg-m9fqf is a moving target, but to verify if that works.

kastl-ars commented 3 months ago

I can test that the week after next.

But I guess if there is a non-empty imagePullSecrets section in the serviceAccount YAML that is sent to OpenShift, OpenShift will not add another one or overwrite it.

And thus ArgoCD should not have anything to complain about.

kastl-ars commented 3 months ago

@kastl-ars Would it be possible for you to change the config and use imagePullSecrets.useExisting (it is a list of names, I think) - and set it to use central-dockercfg-m9fqf? I know that central-dockercfg-m9fqf is a moving target, but to verify if that works.

Using this in the values.yaml I see a green checkmark on the central service, as the desired manifest and the live manifest are in sync.

stackrox-central-services:
  allowNonstandardReleaseName: true
  imagePullSecrets:
    useExisting: central-dockercfg-m9fqf

But the other two service accounts (scanner and central-db) are still out of sync, as the live manifest now contains two imagePullSecrets, the one it previously had and the one I specified above.

Deleting the service account does not help, it is being recreated (and Kubernetes/OpenShift knows that there is a scanner-dockercfg-hj6xf secret that the scanner serviceAccount should use due to the kubernetes.io/service-account.name: scanner annotation on the secret).

mtodor commented 3 months ago

@kastl-ars did you consider adjusting diffing with ignoreDifferences option in ArgoCD?

Something like:

ignoreDifferences:
  - kind: ServiceAccount
    name: central
    namespace: stackrox
    jqPathExpressions:
      - .imagePullSecrets
kastl-ars commented 3 months ago

@kastl-ars did you consider adjusting diffing with ignoreDifferences option in ArgoCD?

Yes, but we try to avoid that and clarify things with upstream first. If I add an exception for each and everything we install I end up with a lot of exceptions and ignores...