scanner-research / scanner

Efficient video analysis at scale
https://scanner-research.github.io/
Apache License 2.0
615 stars 108 forks source link

If workers die before job creation, master blocks on exponential backoff #224

Open willcrichton opened 5 years ago

willcrichton commented 5 years ago

When starting jobs on workers, the master enters this loop: https://github.com/scanner-research/scanner/blob/master/scanner/engine/master.cpp#L1984

Which takes the work mutex, preventing any further work or communication: https://github.com/scanner-research/scanner/blob/master/scanner/engine/master.cpp#L1663

However, if workers have died but not yet been unregistered before this process, then the loop takes ~2min to finish where the entire system (including the client) is blocked on the master.

@apoms is there a simple way to fix this?