hanfei1991 / microcosm

a mini bench expreriment for a task runtime scheduler
8 stars 6 forks source link

Failing during redispatch pending jobs will cause the fail of leader #362

Closed hanfei1991 closed 2 years ago

hanfei1991 commented 2 years ago

See this line https://github.com/hanfei1991/microcosm/blob/cc83df8977fa1ebf147dbaf5e156c6a2ae6ce9d0/servermaster/jobmanager.go#L229

When we are dispatching the pending jobs, the failure like "master has reached concurrency quota" will cause the leader fail and reelection.

After reelection in the same node, "SyncAddHandler" in p2p pkg timeout repeatedly.