Integration suite ACLEnabledSuite is failing sporadically

traefik / mesh

Traefik Mesh - Simpler Service Mesh

Apache License 2.0

2.03k stars 141 forks source link

Bug Report

What did you do?

Opened a PR without modifying the Go code.

What did you expect to see?

Successful build.

What did you see instead?

A force push without any modifications, fixed the build.

This is due to the introduction of https://github.com/containous/maesh/blob/master/integration/try/try.go#L66

Where unavailable replicas are checked to ensure that deployments have rolled over.

If a terminating replica is slow, its possible that this could cause this condition to fail.

Perhaps >0 is not the right condition.

The issue it was implemented to solve was that a deployment was updated, but then instantly checked, and it was still considered "ready", even though it still had to restart all pods, as all current (old) pods were ready.

I am not sure how to improve that condition without checking the pod SHAs

Perhaps we should follow down and check the new ReplicaSet instead of trying to do it at the deployment level.

traefik / mesh