Try another replica when call on previous selected is failure

R-omk commented 4 years ago

The router should detect that the node in the replicaset is unreachable and send a request to another suitable replica.

For each type of call request there should be a strategy in which priority order to try next failover. It should also use circuit breaker design pattern. Currently, the strategy is just waiting for the rebuilding of the sharding map, but this is not enough.

How to reproduce: suppose we have a replica set with a master and a slave, If you drop tcp connections between the router and the slave storage, any function call with mode=read prefer_replica=true will always wait for a timeout and such requests will always result in an error

R-omk commented 4 years ago

Here is an example of how this is solved in the http world.

https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_next_upstream

Gerold103 commented 4 years ago

Another replica is tried. This is called read failover, and it works. When you say 'prefer_replica', you explicitly forbid to go on master.

R-omk commented 4 years ago

https://www.tarantool.io/ru/doc/2.3/reference/reference_rock/vshard/vshard_ref/

If prefer_replica=true is specified then the preferred target is one of the replicas, but the target is the master if there is no conveniently available replica.

It may be good to specify prefer_replica=true for functions which are expensive in terms of resource use, to avoid slowing down the master.

everything is written correctly here, because prefer means prefer, but not require

Gerold103 commented 4 years ago

Hm. That looks like just a bug in failover then.

R-omk commented 4 years ago

Same problem even if replicaset has more then one replica (ro). Failover just not work.

R-omk commented 4 years ago

Or maybe the entire timeout was spent on establishing a connection to the replica. To solve this, it need a smarter strategy that includes health checks, correct timeouts and temporary ban.

Moreover, it is desirable to separate tcp connection timeout and read response timeouts, but I'm not sure that in the tarantool these errors are separated. (For example, if a tcp connection timeout occurs, then you can try rw requests again, but in case of read timeout happens we don't now real state)

Gerold103 commented 4 years ago

Could you provide a stable reproducer of not working failover when there are more than 1 replica? With code example to try, topology, and config. I have tests proving, that failover works. https://github.com/tarantool/vshard/tree/master/test/failover. At least it works for the given test cases. It definitely is not 'just not work'.

Gerold103 commented 4 years ago

Yeah, and this For example, if a tcp connection timeout occurs, then you can try rw requests again is false. You can't retry it. Timeout on your side does not say anything about a client side. The request still could arrive and be executed on a client. So it is not safe to retry RW requests if a timeout happens. At least automatically. Talking of read response timeout - it is basically the same tcp timeout. If you got a timeout, you don't know whether your request was delivered. In both cases you just sent something, and didn't get a response in time.

Gerold103 commented 4 years ago

Appeared, that it is related to #198.

R-omk commented 4 years ago

I also get an error that was not handled by the router. my b relicaset: [b1 - rw, b2 - ro, b3 - ro]

run: docker stop b2 b3
right after that run in router console: vshard.router.callre(bucket_on_b, 'some_get_f', args, {timoeout =15 })
I immediately get the result
```
---
```
null
type: ClientError code: 77 message: Connection reset by peer trace:
- file: builtin/box/net_box.lua line: 280 ...


- all subsequent requests with b2 b1 turned off work correctly and handled by b1

Gerold103 commented 4 years ago

Yes, looks like a bug. Will work on that.

R-omk commented 2 years ago

You can't retry it. Timeout on your side does not say anything about a client side.

Connection timeout means "we not yet sent any byte from payload, just trying to create tcp (three-way TCP handshake not yet happens)" (or l7 tarantool greetings instead of tcp handshake)

This can be used to retry rw requests only if it is possible to distinguish one error from another.

tarantool / vshard

Try another replica when call on previous selected is failure #222