retrying in onReadTimeoutis always safe, since by definition this error indicates that the query was a read, which didn’t mutate any data;
similarly, onUnavailableis safe: the coordinator is telling us that it didn’t find enough replicas, so we know that it didn’t try to apply the query.
onWriteTimeout is not safe: some replicas failed to reply to the coordinator in time, but they might still have applied the mutation;
onRequestError is not safe either: the query might have been applied before the error occurred. In particular, an OperationTimedOutException could be caused by a network issue that prevented a successful response to come back to the client.
The retry logic and the timeouts all come from the Datastax drivers that existed circa 2015 when this driver was implemented. I won't be updating the logic myself, but thanks for raising the issue!
As it says in https://docs.datastax.com/en/developer/java-driver/3.2/manual/retries/#retries-and-idempotence
But looking into a code I see that it won't retry onUnavailable but in the same time it will on onWriteTimeout which is not safe: https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/cluster.lua#L727-L764 https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/policies/retry/simple.lua#L38-L48
Also, it is not pretty clear where those timeouts come https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/cluster.lua#L752
So, I have a few questions:
Currently, we switched off the retry mechanism by setting
retry_on_timeout
to false andmax_retries
to one.