seyyed / scalaris

Automatically exported from code.google.com/p/scalaris
Apache License 2.0
0 stars 0 forks source link

missing replica repair #52

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
>What steps will reproduce the problem?
1. Run Scalaris in configuration with four or more nodes.
2. Initiate the base with some data.
3. Suspend any node (the node should contain not more than one replica of any 
key) and wait some time.
4. Change a few keys, which replicas are on the suspended node.
5. Resume the node.

>What is the expected output? What do you see instead?
After that we got a situation where the three replicas will be one key value 
and to one another.
Why is it bad? Because the situation is almost equivalent to a denial of the 
node. In the case when another node fails, the database will be a non-working.

What version of the product are you using? On what operating system?
svn 944

Original issue reported on code.google.com by serge.po...@gmail.com on 30 Jul 2010 at 2:12

GoogleCodeExporter commented 8 years ago
currently there is no replica-repair in scalaris - we are working on it...

Original comment by nico.kru...@googlemail.com on 3 Aug 2010 at 1:39

GoogleCodeExporter commented 8 years ago
Hi!
What about particular key replicas desync in the case when some message was 
lost, for example 'commit' message for a TP. All others TP got the message but 
this one not. Do you plan to create some automatic tool for repair this?

Original comment by serge.po...@gmail.com on 3 Aug 2010 at 2:08

GoogleCodeExporter commented 8 years ago
Currently, the repair is only implicitly done, when further writes are 
performed on the same items. Message loss is not a usual issue in Scalaris, as 
we use TCP for point-to-point communication. If a commit would be lost, a later 
write operation would fix the desynced replica.

Yes, we will have some automatic, periodically running, replica repair 
mechanism. 

Original comment by schin...@gmail.com on 3 Aug 2010 at 2:43

GoogleCodeExporter commented 8 years ago
> Message loss is not a usual issue in Scalaris, as we use TCP for 
point-to-point communication.

This effect can be obtained by short-term failure of communication, which is 
not revealed by the failure detector. Your comm_port server don't check a send 
operation result, so is real to "lose" the message. 

Original comment by serge.po...@gmail.com on 3 Aug 2010 at 3:35

GoogleCodeExporter commented 8 years ago
Hi,
I am wondering if there is any replica-repair mechanism in the newest version 
of scalaris(0.3.0). My simple tests results shows that, when a data node 
crashes, there is no replica-repair mechanism like hadoop, the data on the node 
is lost. When the node join in again, the ping message never stop 
transmission(ping, pong, discard). May be there is some problem with my 
configure value.
Thanks a lot for the great work you've done.

Original comment by suleed....@gmail.com on 7 Sep 2011 at 1:16

GoogleCodeExporter commented 8 years ago
and also another two questions about scalaris...
1. some tests show that, when the manager server crashes, there is no backup 
mechanism. Is that correct? I am a newer for scalaris... , and I didn't find 
information about these in the user-guide doc(main.pdf), and any other docs of 
scalaris. 

2. when I user my web browser to insert data into scalaris, the replication 
factor  in the configuration file does not take effect ----> the replica value 
is always =4.
I am doing more tests in the erlang API.

Original comment by suleed....@gmail.com on 7 Sep 2011 at 1:29

GoogleCodeExporter commented 8 years ago
1) Replica-Repair is not available in 0.3.0. Some work has been done in svn 
trunk but we currently concentrate on updating old versions (replica update).

2) What do you mean by "ping message never stop transmission(ping, pong, 
discard)"? How is your setup, what is happening? (please respond by mail or on 
the mailing list scalaris@googlegroups.com as this is issue-unrelated)

3) The management server is not needed for normal operation. It only serves the 
purpose for having an overview of the complete ring which you otherwise won't 
have. As such, there's no fail-over mechanism.

4) The replication factor is currently hard-coded to 4 - see issue 57

Original comment by nico.kru...@googlemail.com on 7 Sep 2011 at 6:14

GoogleCodeExporter commented 8 years ago
Experimental support for replica repair has been added to Scalaris and is 
available since 0.5.0 (thanks to Maik Lange).
Configuration examples have been added to trunk with r4512 - note that you may 
need to specify all set reconciliation algorithms' parameters if you are not 
using trunk (ref. r4511).

Original comment by nico.kru...@googlemail.com on 27 Feb 2013 at 2:42