teamhephy / controller

Hephy Workflow Controller (API)
https://teamhephy.com
MIT License
14 stars 26 forks source link

fix(scheduler): handle missing timestamps in pod events #129

Closed felixbuenemann closed 4 years ago

felixbuenemann commented 4 years ago

Fix a crash during "deis run", when some pod events are missing a timestamp, see: https://github.com/kubernetes/kubernetes/issues/89689

Without this bugfix error like this can show up during deployments or deis run on Kubernetes 1.17.x:

Deployment:

Unknown Error (400): {"detail":"(app::deploy): unorderable types: str() < NoneType()"}

Run:

Error: Unknown Error (503): {"detail":"my-app-run-x47pk (run): '<' not supported between instances of 'str' and 'NoneType'"}

This bugfix has been verified to fix the "deis run" errors, but will likely fix the same error during deployment, since no other codepaths seem to sort event timestamps.

The deployment error doesn't happen every time, so some time is requried to see, if it is also fixed.

felixbuenemann commented 4 years ago

OK, the errors TypeError: unorderable types: str() < NoneType() and TypeError: '<' not supported between instances of 'str' and 'NoneType' seem to be the same, but from different Python versions.

This makes sense since the cluster where I was seeing the error was running v2.21.3 when I got the error reports and my first try was to upgrade to v2.21.6, at which point the error message changed.

So this is very likely to also fix the sporadic errors seen during deployment.

Cryptophobia commented 4 years ago

Niiiice fix. Since it is so isolated to event timestamps I don't see a problem in just merging it and seeing if the errors go away in new version of hephy workflow :+1: