ClusterClient does not retry TxPipeline on EOF when reading response.

There are occasions when a connection handling a TxPipeline gets closed unexpectedly (maybe because the remote server shut down). Despite the pool's attempts to discard bad idle connections, this validation is not 100% reliable (since the connection can be closed after it's returned from the pool and while it's being used). In my experience this results in an EOF error not while writing the commands on the connection but only once attempting to read the result.

Expected Behavior

When using a ClusterClient.TxPipeline, pipelines should be retried when an EOF (or another retryable error) is returned while reading the pipeline responses.

Current Behavior

ClusterClient.TxPipeline will retry if an EOF is observed while writing the commands to the connection: https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1497-L1505

However, when reading the response from the connection in https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1508-L1525 and https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1537-L1547.

In particular, there may be an EOF in these circumstances:

Instead of the OK response to MULTI: https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1537
Instead of the QUEUED response to the pipelined commands: https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1542
Instead of the response to EXEC: https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1550
While reading the actual pipelined command responses: https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1525

However, only the last case handles retryable errors and updates failedCmds to trigger the retry machinery: https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1354-L1360

Possible Solution

Update txPipelineReadQueued and the error handling for it in processTxPipelineNodeConn to account for retryable errors and update the failedCmds parameter appropriately.

Steps to Reproduce

Let me know if there's appetite for addressing this and I can work on a repro.

1. 2. 3. 4.

Context (Environment)

Redis Server: 7.1 go-redis client: 9.5.1

redis / go-redis