traefik / mesh

Traefik Mesh - Simpler Service Mesh
https://traefik.io/traefik-mesh
Apache License 2.0
2.03k stars 141 forks source link

Integration suite ACLEnabledSuite is failing sporadically #615

Closed kevinpollet closed 4 years ago

kevinpollet commented 4 years ago

Bug Report

What did you do?

Opened a PR without modifying the Go code.

What did you expect to see?

Successful build.

What did you see instead?

The integration tests failed because of the ACLEnabledSuite as shown here: https://containous.semaphoreci.com/jobs/8bb8e781-dcd1-4292-9f40-a6774ce646ad.

A force push without any modifications, fixed the build.

dtomcej commented 4 years ago

This is due to the introduction of https://github.com/containous/maesh/blob/master/integration/try/try.go#L66

Where unavailable replicas are checked to ensure that deployments have rolled over.

If a terminating replica is slow, its possible that this could cause this condition to fail.

Perhaps >0 is not the right condition.

The issue it was implemented to solve was that a deployment was updated, but then instantly checked, and it was still considered "ready", even though it still had to restart all pods, as all current (old) pods were ready.

I am not sure how to improve that condition without checking the pod SHAs

Perhaps we should follow down and check the new ReplicaSet instead of trying to do it at the deployment level.