seyyed / scalaris

Automatically exported from code.google.com/p/scalaris
Apache License 2.0
0 stars 0 forks source link

Errors with cs_api #49

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
>What steps will reproduce the problem?
1. Run scalaris with default config on four nodes.
2. Unpack attached file
3. Set your scalaris nodes in test.conf file.
4. Run test.sh script.

>What is the expected output? What do you see instead?

The program tries to run multiple clients simultaneously. 
Each client must perform a certain number of transactions. 
In each transaction is executed a specified number of read operations.
(All values defined in test.conf)
In the execution of the program displays information about the progress and at 
the end - summary.

By default 10 clients will spawned.
The summary contains information about the number of successful clients.
It should be also equal 10.

But in my case it less then 10 (usually 7) and I receive messages about bad RPC 
calls, like this:

Error for client 7 : {'EXIT',{{badmatch,{badrpc,{'EXIT',killed}}},
                              [{test_dconf,'-client/4-fun-0-',2},
                               {lists,map,2},
                               {test_dconf,'-client/4-fun-1-',2},
                               {timer,tc,3},
                               {test_dconf,client,4}]}}

Why is this happening?

>What version of the product are you using? On what operating system?

Erlang 13B04, scalaris 0.2.3 and svn_827. Both products are selfcompiled.

> Please provide any additional information below.

Scalaris v0.2.3 runs approximately 30(thirty) times slower than the version 
from svn 827...

What the right way to define initial TLog in cs_api?

Thanks.

Original issue reported on code.google.com by serge.po...@gmail.com on 17 Jun 2010 at 12:11

Attachments:

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I think I found the reason ...

Each client before work loads on the node with scalaris the code of 
transactions procedures. When multiple clients are trying to do it at the same 
node, then we obtain such a result.

If the preload code at sclaris nodes, but clients do not, then everything works 
fine.

Original comment by serge.po...@gmail.com on 17 Jun 2010 at 1:35

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Hi!
Please comment on several issues about the cs_api.

1. What the right way to define initial TLog in cs_api (f.e. 
cs_api:process_request_list(TLog, RequestList))?  Now I use an empty list, but 
I think it is not entirely correct.

2. Is it right to such behavior of scalaris that when reading the key is not in 
the database, the transaction was aborted? 
I'd like to read multiple keys within a single transaction and the lack of 
values for some key suits me.

3. Do you have plans to add to scalaris listeners of erlang-queries to cs_api 
(something  like in attachment)? Because now I need to upload and run it at all 
VM with scalaris by myself.

Thanks.

Original comment by serge.po...@gmail.com on 18 Jun 2010 at 10:25

Attachments:

GoogleCodeExporter commented 8 years ago
Hi again! :)

I found the answers to issues 2 and 3 by myself:

2. This behavior is fixed in cs_api_v2.
3. I re-invented RPC. :)

Issue 1 is still under the question. I know about the module txlog, but I'm not 
sure that it is a public API.

Original comment by serge.po...@gmail.com on 18 Jun 2010 at 12:38

GoogleCodeExporter commented 8 years ago
It's me again...

New trouble happened.

>What steps will reproduce the problem?
1. Run scalaris with default config on four nodes.
2. Unpack attached file
3. Set your scalaris nodes in test.conf file.
4. Run test2.sh script.
5. Stop the erlang shell and run test2.sh again.

The test2.sh script runs two-phase procedure. 
In the first phase is initialization of the database by random values. In the 
second phase many clients simultaneously executes multiple transactions. Each 
transaction is reading the five values from the database and produces a 
modification of one of them. 
In the process of testing many transactions will be aborted due to conflicts. 
But looks like the scalaris had locked the keys and not unlocked them in the 
aftermath.
When I try again to initialize the database the operation stops on an attempt 
to rewrite the key...

>What version of the product are you using? On what operating system?

Erlang 13B04, scalaris svn_827. 

Original comment by serge.po...@gmail.com on 18 Jun 2010 at 2:25

Attachments:

GoogleCodeExporter commented 8 years ago
I check the base for write-locked keys and found several records with it...
{ok, Dump}=cs_api_v2:range_read(0,0).
...
Locked=[V || {_,_,true,_,_}=V <- Dump].
[{277778027018462508739627147842286036859,
  <<0,0,1,176,210,46,116,83,208,0,0,0,0,0,0,1>>,
  true,0,0},
 {269000265199891748532535480797529972038,
  <<0,0,9,2,29,171,127,158,174,0,0,0,0,0,0,1>>,
  true,0,0},
 {251262752662925821357539767425396800648,
  <<0,0,6,240,132,89,7,241,228,0,0,0,0,0,0,1>>,
  true,0,0},
 {248108486359303934531771989187453600326,
  <<0,0,10,27,255,181,212,121,46,0,0,0,0,0,0,1>>,
  true,0,0},
 {247288817414766453814592212813169728034,
  <<0,0,1,43,7,42,116,153,164,0,0,0,0,0,0,1>>,
  true,0,0},
 {218806553050311150354029171101966166357,
  <<0,0,12,22,120,42,220,218,66,0,0,0,0,0,0,1>>,
  true,0,0},
 {205756440302323703940410432667444187441,
  <<0,0,13,8,46,57,205,198,190,0,0,0,0,0,0,1>>,
  true,0,0},
 {192707435288227892873783495984343983995,
  <<0,0,1,176,210,46,116,83,208,0,0,0,0,0,0,1>>,
  true,0,0},
 {183929673469657132666691828939587919174,
  <<0,0,9,2,29,171,127,158,174,0,0,0,0,0,0,1>>,
  true,0,0}]

Original comment by serge.po...@gmail.com on 18 Jun 2010 at 2:39

GoogleCodeExporter commented 8 years ago
for 1. I added cs_api*:new_tlog/0 to get an initial empty transaction log.

Original comment by schin...@gmail.com on 7 Jul 2010 at 11:46

GoogleCodeExporter commented 8 years ago
> for 1. I added cs_api*:new_tlog/0 to get an initial empty transaction log.
Thanks.

Any suggestions about the locks?

Original comment by serge.po...@gmail.com on 14 Jul 2010 at 12:57

GoogleCodeExporter commented 8 years ago
I starts a new issue about the locks with a proper subject.

Original comment by serge.po...@gmail.com on 16 Jul 2010 at 10:36

GoogleCodeExporter commented 8 years ago
The locks issue is not solved yet and I am aware of it. But currently I have no
time to dig into the details and fix it. It should only happen with cs_api_v2 
which is still experimental.

I will start looking at it in the last week of July, but guess it could be hard 
to find as it is not easily reproducible.

Original comment by schin...@gmail.com on 16 Jul 2010 at 12:57

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Please look the patch.

Original comment by serge.po...@gmail.com on 20 Jul 2010 at 3:20

Attachments:

GoogleCodeExporter commented 8 years ago
A more optimized case may be used, but I don't check it:

newly_decided(State) ->
    case get_decided(State) of
        false ->
            NumAbort = get_numabort(State),
        if NumAbort > 0 -> abort;
           true ->
            NumPrepared = get_numprepared(State),
            case get_repl_factor(State)=:=(NumPrepared+NumAbort) of 
            true -> prepared;
            _ -> false
            end;
           end;
        _Any -> false
    end.

Original comment by serge.po...@gmail.com on 20 Jul 2010 at 3:40

GoogleCodeExporter commented 8 years ago
With the patch you would demand all replicas to be available for each 
transaction.
Then, the system would not be able to perform transactions if only a single 
involved replica would fail, which is not intended.

Original comment by schin...@gmail.com on 22 Jul 2010 at 10:15

GoogleCodeExporter commented 8 years ago
But the TM should be assured that _all_ replicas involved in transaction 
transmits its decisions. Not only from a majority.

Original comment by serge.po...@gmail.com on 22 Jul 2010 at 10:27

GoogleCodeExporter commented 8 years ago
No, majority is enough, also for the TM.

Original comment by schin...@gmail.com on 22 Jul 2010 at 10:29

GoogleCodeExporter commented 8 years ago
Locks occur due to the fact that the replicas return different values. For 
example two replicas transmit a "prepare" value and other two - "abort" (the 
replicas was locked by another transaction). In this case your code can't 
decide because it expect a majority number of _same_ decisions.

Original comment by serge.po...@gmail.com on 22 Jul 2010 at 10:47

GoogleCodeExporter commented 8 years ago
So, new version of the function:

newly_decided(State) ->
    case get_decided(State) of
        false ->
            NumPrepared = get_numprepared(State),
            NumAbort = get_numabort(State),
        case get_majority(State) of
        NumPrepared -> prepared;
        NumAbort -> abort;
        _ ->
            case get_repl_factor(State) =:= (NumPrepared+NumAbort) of 
            true -> abort;
            _ -> false
            end
        end;
        _Any -> false
    end.

Original comment by serge.po...@gmail.com on 22 Jul 2010 at 2:15

Attachments:

GoogleCodeExporter commented 8 years ago
Ok, I see your point. Very good. But I would not fix it like you did. In your 
fix  for the new 'abort' case *all* replica have to respond. For an even 
replication degree, it is sufficient that half of the replicas vote for abort 
to decide on abort as a majority cannot be reached any longer. For an odd 
replication degree either a majority has to vote for 'abort' or for 'prepared'.

So 'abort'-majority and 'prepared'-majority differ in case of an even 
replication degree, but are the same for an odd replication degree.

I will try something like that next week.

Original comment by schin...@gmail.com on 22 Jul 2010 at 3:19

GoogleCodeExporter commented 8 years ago
Ok. Thanks!

Original comment by serge.po...@gmail.com on 22 Jul 2010 at 3:28

GoogleCodeExporter commented 8 years ago
And so, majority factor in scalaris is a function of replication_factor and 
therefore should not be defined in a config file.

Original comment by serge.po...@gmail.com on 22 Jul 2010 at 3:55

GoogleCodeExporter commented 8 years ago
Hi,

please could you try the following patch?

Original comment by schin...@gmail.com on 26 Jul 2010 at 12:38

Attachments:

GoogleCodeExporter commented 8 years ago
It should work, because I do same changes, but I store majority_for_prepare and 
majority_for_abort values in an item state. This eliminate unneeded 
calculations on every decision receive. Look the attachment.

Original comment by serge.po...@gmail.com on 26 Jul 2010 at 1:26

Attachments:

GoogleCodeExporter commented 8 years ago
BTW, can you help me with this 
http://groups.google.com/group/scalaris/browse_thread/thread/ea9af755fecc2c39 ?

Original comment by serge.po...@gmail.com on 26 Jul 2010 at 1:52

GoogleCodeExporter commented 8 years ago
Integer division and reminder on base 2 should be fast enough, I don't think we 
need this optimization. Additionally, I still would use functions instead of 
macros, as no macro is necessary to move the calculation to new() - macros are 
'evil'.

Original comment by schin...@gmail.com on 26 Jul 2010 at 2:04

GoogleCodeExporter commented 8 years ago
lock handling, as reported here, is fixed in r917.

Original comment by schin...@gmail.com on 27 Jul 2010 at 8:22