The goroutines started in doPoll execute the logic for actually making the poll RPCs, as well as any wrapping layers around them. Notably the proto conversion logic contains panics if there are unexpected values, and if we fail to handle these panics the application crashes.
What changed?
Add handling for panics while polling
Why?
Prevent services from crashing due to unexpected behavior
How did you test it?
Tested via unit tests
Potential risks
These panics indicate a pretty core disconnect between the client and the server state, and if encountered it's unlikely that the situation will resolve without a change to the server or the client. It seems safer to indefinitely retry these panicking requests and hope that a server-side change will resolve the issue than crashing the worker, but workers in this state will likely have a higher RPS than they would otherwise. Cadence's rate limiting support should be able to mitigate risk from that.
The goroutines started in doPoll execute the logic for actually making the poll RPCs, as well as any wrapping layers around them. Notably the proto conversion logic contains panics if there are unexpected values, and if we fail to handle these panics the application crashes.
What changed?
Why?
How did you test it?
Potential risks
These panics indicate a pretty core disconnect between the client and the server state, and if encountered it's unlikely that the situation will resolve without a change to the server or the client. It seems safer to indefinitely retry these panicking requests and hope that a server-side change will resolve the issue than crashing the worker, but workers in this state will likely have a higher RPS than they would otherwise. Cadence's rate limiting support should be able to mitigate risk from that.