Open wallrj opened 6 years ago
As you say, #219 should resolve the e2e test passing problem there.
I think blocking subprocess start on there being a Pilot is a separate enhancement/feature, and has its own nuances that need to be discussed. Whilst we don't have a requirement for this functionality though I think it should be kept on the backlog. If some other new feature does depend on this, or we start seeing failures as a result of it, then we should reconsider.
Namely, I think we need to add 'can be elected conditions' to Pilots (e.g. this Pilot is not a candidate because it isn't healthy). Right now any Pilot can become leader so long as it is running in some form. But that seems like a separate discussion.
I consider it a bug, that all the unit and E2E tests pass, despite these errors in the logs:
I think we should discuss whether the Pilot should exit unless leader election succeeds within a timeout. On the other hand, perhaps it's a good thing that the database keeps running regardless of bugs in the Pilot. Not sure.
There's an additional leaderelection hook that we could use to wait for successful leader election.
/kind bug