valkey-io / valkey

A new project to resume development on the formerly open-source Redis project. We're calling it Valkey, since it's a twist on the key-value datastore.
https://valkey.io
Other
14.6k stars 520 forks source link

[NEW] Support for Active/Active replication #512

Open btzq opened 1 month ago

btzq commented 1 month ago

The problem/use-case that the feature addresses

This is a feature, only available in the enterprise version of redis. Would like to request for this to be made possible in Valkey?

Description of the feature

Because Valkey at most can only be set as an Active-Passive deployment, it makes it difficult to implement an Active-Active setup for disaster recovery purposes. The deram is to set up a Multi Active-Active Valkey deployment, so we can achieve an Active Active Production and Disaster Recovery setup.

Alternatives you've considered

Redis Enterprise? But very expensive...

Additional information

None other than the above

madolson commented 1 month ago

Can you add some additional information about what types of workloads you are trying to support. Active/Active replication is a hard problem with a lot of tradeoffs. It would be helpful to have more information about your specific usecase to inform those tradeoffs.

Amitgb14 commented 1 month ago

It might be this possible only on Proxy level. It make more complex if it start support in core Valkey code.

btzq commented 1 month ago

Can you add some additional information about what types of workloads you are trying to support. Active/Active replication is a hard problem with a lot of tradeoffs. It would be helpful to have more information about your specific usecase to inform those tradeoffs.

In our case, we currently use Redis to hold user session info for SSO purposes.

Because our setup is currently Active/Passive, when we do conduct a failover to disaster recovery site, all users will need to relogin. If there was an active/active setup for Valkey, it would be pretty great so there wouldnt be any downtime during a failover.

btzq commented 1 month ago

It might be this possible only on Proxy level. It make more complex if it start support in core Valkey code.

Im not quite sure how Redis Enterprise works, but im sure their version of Active/Active is done outside the core as well.

madolson commented 1 month ago

In our case, we currently use Redis to hold user session info for SSO purposes.

I presume that means you are using a hash or a string? Are you okay with a timestamp at the object level to determining which objects wins.

Amitgb14 commented 1 month ago

@btzq , I think your problem can resolved by setup proxy in-between or setup slave on secondary region and any downtime you just failover, It's great if Valkey will think to support in future

cjabrantes commented 1 week ago

Just to add that besides Redis Enterprise, also keydb (open source redis drop-in) is supporting it and the configuration is quite trivial: --multi-master yes --active-replica yes --replicaof IP PORT

@madolson: Dont know if helps or even if your question was in this direction, but from https://docs.keydb.dev/docs/active-rep: Split Brain KeyDB can handle split brain scenarios where the connection between masters is severed, but writes continue to be made. Each write is timestamped and when the connection is restored each master will share their new data. The newest write will win. This prevents stale data from overwriting new data written after the connection was severed.

IMHO (i may be wrong) I think this is a great feature as it is a simple and cheap way to get HA without manual intervention (as all nodes accept writes, no need of election) and without the complexity (resources, elections) of cluster/sentinel setups.

I would really like to use Valkey, but "unfortunately" i need this feature.

madolson commented 1 week ago

@cjabrantes I skimmed the document quickly and don't know much about the technical details, so not sure how much it would be to implement this feature. I know that keyDB attaches a timestamp to each record that it uses for last writer wins, do you know if it replicates the entire object when mutations occur? That seems like a simple way to get last writer wins fairly quickly, but won't work well for certain workloads like distributed counters.

cjabrantes commented 1 week ago

@madolson, Sadly i dont know how it behaves.