seyyed / scalaris

Automatically exported from code.google.com/p/scalaris
Apache License 2.0
0 stars 0 forks source link

Is Scalaris really fail-over? #54

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Hi!
I have a question about the practical fault tolerance of Scalaris. Is the fact 
that the Scalaris fail-over is based on availability of keys replicas. The 
formula R=2F+1 gives us the assurance that in case of failure of F number of 
replicas the base will continue to work. But in real life one say about F 
number of _physical_ nodes, because a physical node is a point of failure not a 
_logical_ replica. In Scalaris replicas are stored in the DHT-nodes, which are 
located on a ring of keys at random. One node may contain an arbitrary number 
of replicas, up to the maximum. Hence the failure of one such DHT-node could 
lead to failure of the entire database. The problem becomes even more acute due 
to the fact that on a single physical node may host any number of DHT-nodes. 
Thus it becomes clear that, in the current implementation, there is no 
guarantee that in case of failure of a single physical node base will remain 
operational. I do not see any way to limit the location of replicas so that at 
one physical node was located no more than one replica of an arbitrary key.
How can I see the problem comes from the use of symmetric replication on a 
single ring, where all replicas share the same set of physical nodes.
What you think about this?
May be another structured overlay needed? For example set of rings, where each 
ring is supported by its own set of physical nodes and is responsible for one 
replica of each key.

Thanks.

Original issue reported on code.google.com by serge.po...@gmail.com on 4 Aug 2010 at 1:21

GoogleCodeExporter commented 8 years ago
In rt_chord, symmetric replication spreads keys in quarters around the ring. 
For a (logical) node to be responsible for multiple replicas of the same K/V 
pair it has to be responsible for at least 1/4 of the ring, i.e. except for the 
first node, all other nodes must have keys in less than 3/4 of the ring.
Assuming uniform distribution of the keys (see randoms:getRandomId/0) and 10 
additional nodes there is already a probability of less than (3/4)^10 =~ 0.056 
that this might happen. The more nodes, the smaller the probability. Of course, 
this is different if you use multiple logical nodes on a single physical node...

Original comment by nico.kru...@googlemail.com on 4 Aug 2010 at 1:45

GoogleCodeExporter commented 8 years ago
> and 10 additional nodes there is already a probability of less than (3/4)^10 
=~ 0.056

The 5,6% is too much as for me. You are talking about probability but I'm 
talking about assurance. Probability of failure of physical node in a shot-term 
is much less than 5% but we anyway need to build the fault-tolerance cluster 
which gives us _assurance_.

>Of course, this is different if you use multiple logical nodes on a single 
physical node...
The administration API of Scalaris gives us the staff ability to run within a 
single Erlang VM a few DHT-nodes (admin:add_nodes function). I think that this 
is done to increase the productivity of the base, is not it? But as can be seen 
use in the production of this possibility can not be.

Original comment by serge.po...@gmail.com on 4 Aug 2010 at 2:35

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
To run multiple logical nodes on a single physical node is not recommended for 
production systems. We have this feature only to be able to execute
large systems more easily for testing.

To steer the replica distribution one can define different key-prefixes for 
nodes in individual configuration files (see {key_creator, 
random_with_bit_mask} in scalaris.cfg which you can override in 
scalaris.local.cfg).

% key_creation algorithm
{key_creator, random}.

%{key_creator, random_with_bit_mask}.
% (randomkey band mask2) bor mask1
%{key_creator_bitmask, {16#00000000000000000000000000000000, 
16#3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF}}.
%{key_creator_bitmask, {16#40000000000000000000000000000000, 
16#3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF}}.
%{key_creator_bitmask, {16#80000000000000000000000000000000, 
16#3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF}}.
%{key_creator_bitmask, {16#C0000000000000000000000000000000, 
16#3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF}}.

One could thereby place four nodes explicitly and use quarter bit-masks as 
shown in the example for additional nodes.

Original comment by schin...@gmail.com on 4 Aug 2010 at 2:52

GoogleCodeExporter commented 8 years ago
Ok. So is a solution. 
But conditions for it are:
1. We need "base" nodes which number must be equal to replicas number.
2. Positions of these nodes on the keyring should be explicitly determinate.
3. After a crash repair these nodes must take one of the free "base" position.

Am I right?

Original comment by serge.po...@gmail.com on 4 Aug 2010 at 4:34

GoogleCodeExporter commented 8 years ago
Yes, you are right.

Original comment by schin...@gmail.com on 4 Aug 2010 at 5:22

GoogleCodeExporter commented 8 years ago
Ok, thanks!
May I ask you to add this conditions to the FAQ or/and to the User Manual?

Original comment by serge.po...@gmail.com on 4 Aug 2010 at 5:27

GoogleCodeExporter commented 8 years ago
With our new passive load balancing (as of r1313), nodes are evenly distributed 
across the ring, so explicit placing is no longer necessary.

Original comment by schin...@gmail.com on 12 Jan 2011 at 6:51