mikespook / gearman-go

This package is a Gearman API for Golang. It was implemented a native protocol for both worker and client API.
MIT License
291 stars 83 forks source link

Fix two race conditions in client.Do #94

Closed cameronpm closed 3 years ago

cameronpm commented 3 years ago

I have added examples/pl/worker_multi.pl to create a bunch of workers for a new integration test to ferret out these race conditions. In more depth:

First race condition occurs in this scenario:

  1. Req A: client.Do called, execution pauses at end of client.Do waiting for dtJobCreated result
  2. Req A: Server sends dtJobCreated, handle created
  3. Req A: client.Do call completes. dtJobCompleted for Req A is yet to be sent
  4. Req B: client.Do called, locks client.respHandler
  5. Req A: Server sends dtWorkComplete for Req A. client.processLoop is unable to invoke the callback as respHandler is locked by another goroutine in step 5. Deadlock.
  6. Req B: client.Do invokes client.do, a new request is written to the server however its innerHandler callback will never be called as client.processLoop is parked attempting to lock respHandler
  7. Req B: client.do() times out after client.ResponseTimeout seconds, releasing the respHandler lock and returning ErrLostConn. This works around the deadlock.

Second race condition (much rarer) occurs in this scenario:

  1. Req A: client.Do called, sends request
  2. Req A: Server sends dtJobCreated, client.processLoop calls client.handleInner which gets the "c" callback, invokes it via h(resp), goroutine execution suspends before it removes it
  3. Req A: client.Do call completes. All locks are relinquished
  4. Req B: client.Do called, assigns a new innerHandler callback "c" which overrides Req A "c" callback (as it has not yet been removed in step 2)
  5. Req A: Step 2 execution resumes, deleting the Req B "c" callback assuming it was Req A's.
  6. Req B: Server sends dtJobCreated, h, ok := client.innerHandler.get(key); ok evaluates to false as the B's callback was wrongly deleted in step 5
  7. Req B: client.do() times out after client.ResponseTimeout seconds returning ErrLostConn. This works around the race condition.