Open willcrichton opened 5 years ago
When starting jobs on workers, the master enters this loop: https://github.com/scanner-research/scanner/blob/master/scanner/engine/master.cpp#L1984
Which takes the work mutex, preventing any further work or communication: https://github.com/scanner-research/scanner/blob/master/scanner/engine/master.cpp#L1663
However, if workers have died but not yet been unregistered before this process, then the loop takes ~2min to finish where the entire system (including the client) is blocked on the master.
@apoms is there a simple way to fix this?
When starting jobs on workers, the master enters this loop: https://github.com/scanner-research/scanner/blob/master/scanner/engine/master.cpp#L1984
Which takes the work mutex, preventing any further work or communication: https://github.com/scanner-research/scanner/blob/master/scanner/engine/master.cpp#L1663
However, if workers have died but not yet been unregistered before this process, then the loop takes ~2min to finish where the entire system (including the client) is blocked on the master.
@apoms is there a simple way to fix this?