indigo-iam / iam

INDIGO Identity and Access Management Service
https://indigo-iam.github.io/
Other
99 stars 43 forks source link

iam-login-service k8s deployment became not ready when mail server is unresponsive #656

Open dmichelotto opened 9 months ago

dmichelotto commented 9 months ago

Dear devels,

we have several deployments of Indigo IAM on our k8s cluster, all the iam-login-service versions are 1.8.2p2 and k8s version is 1.25.6.

Today, during our mail server maintenance window, when it became unresponsive for some minutes, all our IAM deployments became not ready. Our deployments have a readiness probe that checks the IAM status on the /actuator/health endpoint that normally reports:

{
  status: "UP",
  components: {
    db: {
      status: "UP"
    },
    diskSpace: {
      status: "UP"
    },
    externalConnectivity: {
      status: "UP"
    },
    livenessState: {
      status: "UP"
    },
    mail: {
      status: "UP"
    },
    ping: {
      status: "UP"
    },
    readinessState: {
      status: "UP"
    }
  },
  groups: [
    "liveness",
    "readiness"
  ]
}

When the mail server has returned operational all iam deployments returned responsive.

So I'm asking if this behavior is expected.

In our configuration we have:

# Enable mail probe
IAM_HEALTH_MAIL_PROBE_ENABLED=true
# Enable external connectivity probe
IAM_HEALTH_EXTERNAL_CONNECTIVITY_PROBE_ENABLED=true

But in according with your documentation here, what I understand is that mail and external probe should be on different endpoints.

Do you have a better suggestion with respect to, as a workaround, set the variables to false?

enricovianello commented 9 months ago

I don't think we have a better workaround. Try to switch at least IAM_HEALTH_MAIL_PROBE_ENABLED to false. IAM_HEALTH_EXTERNAL_CONNECTIVITY_PROBE_ENABLED is a check on the external connectivity and it's not related but feel free to turn off also that probe.

Anyway, we need to check if the presence of mail and external connectivity status is expected or not in the main health endpoint when they're enabled.

Thanks for the notice!

dmichelotto commented 9 months ago

Can be sufficient to define iam ready checking the endpoint /actuator/health/readinessState and can be sufficient to define iam alive checking the endpoint /actuator/health/livenessState with k8s readiness e liveness probes?