prabhatbhattarai / project-voldemort

Automatically exported from code.google.com/p/project-voldemort
Apache License 2.0
0 stars 0 forks source link

Allow required reads to be specified during get #128

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Let's walk through a scenario. A store with preferred/required reads = 1,
required writes = 2 and replication factor = 2. To make it simple, let's
say that there are only 2 nodes in the cluster. Client is ok with seeing
some temporary inconsistency, but doesn't want to lose writes.

After operating for some time, there's a temporary issue with storage in
one of the nodes, so it loses some of the updates. At this stage, reads may
return either the latest or an older version of certain keys depending on
which node the request goes to.

To write, the client gets a key, updates it and does a put. However, if the
client got the key from the node that has stale keys, the put will fail
with an ObsoleteVersionException. Fair enough, this is the right behaviour.
The question now, is how can the client fix this? The way that seems
obvious to me is to do a get with required reads == required writes. This
would give it enough information to fix things (repairReads may even be
enough to fix the issue).

As far as I can see, Voldemort doesn't provide a way for users to perform a
get with a different required reads than the one configured by default
though. Is there a better way to achieve what I outlined here or is this
something worth adding?

Original issue reported on code.google.com by ismaelj on 16 Jul 2009 at 8:21

GoogleCodeExporter commented 8 years ago
Other comments from github follow.

jkreps  on June 10, 2009:

Wouldn't a loop that does get(); update; put() until it succeeds solve this 
problem?

ijuma on June 10, 2009:

I don't see how, but maybe I am missing something. As I understand it, the get 
will
always retrieve the data from the same node (stale data) and the put would fail 
with
the obsolete exception. It seems to me that there's no way to recover from a 
scenario
where the required writes seem to have succeeded, but haven't if required reads 
== 1.

The reason why I assume that the get always retrieves data from the same node is
based on my understanding of how the routing based on consistent hashing works. 
It's
perfectly possible that I am missing some detail.

Original comment by ismaelj on 16 Jul 2009 at 8:22

GoogleCodeExporter commented 8 years ago
Ismael, are you working on this? Anyone interested in this feature still?

Original comment by kirktrue.im@gmail.com on 18 Nov 2010 at 6:56

GoogleCodeExporter commented 8 years ago
Sorry, regarding this, I saw a thread on Quora about the benefits that 
Cassandra provides by being able to specify the # reads on a per-call basis.

Original comment by kirktrue.im@gmail.com on 18 Nov 2010 at 6:56

GoogleCodeExporter commented 8 years ago
Again I wasn't sure if others agreed that this is a direction we should pursue. 
I definitely think it's useful.

Original comment by ismaelj on 24 Nov 2010 at 8:09