Closed JessicaLHartog closed 6 years ago
@erikdw Updated:
// Note: We still have _offersLock at this point, so we return the empty ArrayList if we happen to have no offers
// this way we can release the lock and acquire new offers. Otherwise proceed through the logic below to see if we
// can make any slots on the offer(s) we do have
@JessicaLHartog the code path changes (because of this PR) for the case where offersEmpty == True
and offersSuppressed == False
, right? Although the end behavior is probably the same i.e returning an empty list.
Yes @srishtyagrawal the code path changes, but the behavior is unchanged.
Thanks for the clarification @JessicaLHartog!
This race can happen when offers are suppressed but an offer comes in after suppression. In this case, we check to see if the offers are empty before checking to see if the offers are suppressed in order to decide if we should revive them.
Example log identifying that this can trigger a problem:
After this point any topology that needs assignment will try to schedule onto the offer(s) available, if that is insufficient, then the offers will never be revived and any subsequent worker deaths will not be able to be rescheduled.
For this fix, we simply stop checking if the offers map is empty before reviving offers. Everything else behaves as it did before this fix.