Closed GoogleCodeExporter closed 8 years ago
[deleted comment]
I think I found the reason ...
Each client before work loads on the node with scalaris the code of
transactions procedures. When multiple clients are trying to do it at the same
node, then we obtain such a result.
If the preload code at sclaris nodes, but clients do not, then everything works
fine.
Original comment by serge.po...@gmail.com
on 17 Jun 2010 at 1:35
[deleted comment]
Hi!
Please comment on several issues about the cs_api.
1. What the right way to define initial TLog in cs_api (f.e.
cs_api:process_request_list(TLog, RequestList))? Now I use an empty list, but
I think it is not entirely correct.
2. Is it right to such behavior of scalaris that when reading the key is not in
the database, the transaction was aborted?
I'd like to read multiple keys within a single transaction and the lack of
values for some key suits me.
3. Do you have plans to add to scalaris listeners of erlang-queries to cs_api
(something like in attachment)? Because now I need to upload and run it at all
VM with scalaris by myself.
Thanks.
Original comment by serge.po...@gmail.com
on 18 Jun 2010 at 10:25
Attachments:
Hi again! :)
I found the answers to issues 2 and 3 by myself:
2. This behavior is fixed in cs_api_v2.
3. I re-invented RPC. :)
Issue 1 is still under the question. I know about the module txlog, but I'm not
sure that it is a public API.
Original comment by serge.po...@gmail.com
on 18 Jun 2010 at 12:38
It's me again...
New trouble happened.
>What steps will reproduce the problem?
1. Run scalaris with default config on four nodes.
2. Unpack attached file
3. Set your scalaris nodes in test.conf file.
4. Run test2.sh script.
5. Stop the erlang shell and run test2.sh again.
The test2.sh script runs two-phase procedure.
In the first phase is initialization of the database by random values. In the
second phase many clients simultaneously executes multiple transactions. Each
transaction is reading the five values from the database and produces a
modification of one of them.
In the process of testing many transactions will be aborted due to conflicts.
But looks like the scalaris had locked the keys and not unlocked them in the
aftermath.
When I try again to initialize the database the operation stops on an attempt
to rewrite the key...
>What version of the product are you using? On what operating system?
Erlang 13B04, scalaris svn_827.
Original comment by serge.po...@gmail.com
on 18 Jun 2010 at 2:25
Attachments:
I check the base for write-locked keys and found several records with it...
{ok, Dump}=cs_api_v2:range_read(0,0).
...
Locked=[V || {_,_,true,_,_}=V <- Dump].
[{277778027018462508739627147842286036859,
<<0,0,1,176,210,46,116,83,208,0,0,0,0,0,0,1>>,
true,0,0},
{269000265199891748532535480797529972038,
<<0,0,9,2,29,171,127,158,174,0,0,0,0,0,0,1>>,
true,0,0},
{251262752662925821357539767425396800648,
<<0,0,6,240,132,89,7,241,228,0,0,0,0,0,0,1>>,
true,0,0},
{248108486359303934531771989187453600326,
<<0,0,10,27,255,181,212,121,46,0,0,0,0,0,0,1>>,
true,0,0},
{247288817414766453814592212813169728034,
<<0,0,1,43,7,42,116,153,164,0,0,0,0,0,0,1>>,
true,0,0},
{218806553050311150354029171101966166357,
<<0,0,12,22,120,42,220,218,66,0,0,0,0,0,0,1>>,
true,0,0},
{205756440302323703940410432667444187441,
<<0,0,13,8,46,57,205,198,190,0,0,0,0,0,0,1>>,
true,0,0},
{192707435288227892873783495984343983995,
<<0,0,1,176,210,46,116,83,208,0,0,0,0,0,0,1>>,
true,0,0},
{183929673469657132666691828939587919174,
<<0,0,9,2,29,171,127,158,174,0,0,0,0,0,0,1>>,
true,0,0}]
Original comment by serge.po...@gmail.com
on 18 Jun 2010 at 2:39
for 1. I added cs_api*:new_tlog/0 to get an initial empty transaction log.
Original comment by schin...@gmail.com
on 7 Jul 2010 at 11:46
> for 1. I added cs_api*:new_tlog/0 to get an initial empty transaction log.
Thanks.
Any suggestions about the locks?
Original comment by serge.po...@gmail.com
on 14 Jul 2010 at 12:57
I starts a new issue about the locks with a proper subject.
Original comment by serge.po...@gmail.com
on 16 Jul 2010 at 10:36
The locks issue is not solved yet and I am aware of it. But currently I have no
time to dig into the details and fix it. It should only happen with cs_api_v2
which is still experimental.
I will start looking at it in the last week of July, but guess it could be hard
to find as it is not easily reproducible.
Original comment by schin...@gmail.com
on 16 Jul 2010 at 12:57
[deleted comment]
Please look the patch.
Original comment by serge.po...@gmail.com
on 20 Jul 2010 at 3:20
Attachments:
A more optimized case may be used, but I don't check it:
newly_decided(State) ->
case get_decided(State) of
false ->
NumAbort = get_numabort(State),
if NumAbort > 0 -> abort;
true ->
NumPrepared = get_numprepared(State),
case get_repl_factor(State)=:=(NumPrepared+NumAbort) of
true -> prepared;
_ -> false
end;
end;
_Any -> false
end.
Original comment by serge.po...@gmail.com
on 20 Jul 2010 at 3:40
With the patch you would demand all replicas to be available for each
transaction.
Then, the system would not be able to perform transactions if only a single
involved replica would fail, which is not intended.
Original comment by schin...@gmail.com
on 22 Jul 2010 at 10:15
But the TM should be assured that _all_ replicas involved in transaction
transmits its decisions. Not only from a majority.
Original comment by serge.po...@gmail.com
on 22 Jul 2010 at 10:27
No, majority is enough, also for the TM.
Original comment by schin...@gmail.com
on 22 Jul 2010 at 10:29
Locks occur due to the fact that the replicas return different values. For
example two replicas transmit a "prepare" value and other two - "abort" (the
replicas was locked by another transaction). In this case your code can't
decide because it expect a majority number of _same_ decisions.
Original comment by serge.po...@gmail.com
on 22 Jul 2010 at 10:47
So, new version of the function:
newly_decided(State) ->
case get_decided(State) of
false ->
NumPrepared = get_numprepared(State),
NumAbort = get_numabort(State),
case get_majority(State) of
NumPrepared -> prepared;
NumAbort -> abort;
_ ->
case get_repl_factor(State) =:= (NumPrepared+NumAbort) of
true -> abort;
_ -> false
end
end;
_Any -> false
end.
Original comment by serge.po...@gmail.com
on 22 Jul 2010 at 2:15
Attachments:
Ok, I see your point. Very good. But I would not fix it like you did. In your
fix for the new 'abort' case *all* replica have to respond. For an even
replication degree, it is sufficient that half of the replicas vote for abort
to decide on abort as a majority cannot be reached any longer. For an odd
replication degree either a majority has to vote for 'abort' or for 'prepared'.
So 'abort'-majority and 'prepared'-majority differ in case of an even
replication degree, but are the same for an odd replication degree.
I will try something like that next week.
Original comment by schin...@gmail.com
on 22 Jul 2010 at 3:19
Ok. Thanks!
Original comment by serge.po...@gmail.com
on 22 Jul 2010 at 3:28
And so, majority factor in scalaris is a function of replication_factor and
therefore should not be defined in a config file.
Original comment by serge.po...@gmail.com
on 22 Jul 2010 at 3:55
Hi,
please could you try the following patch?
Original comment by schin...@gmail.com
on 26 Jul 2010 at 12:38
Attachments:
It should work, because I do same changes, but I store majority_for_prepare and
majority_for_abort values in an item state. This eliminate unneeded
calculations on every decision receive. Look the attachment.
Original comment by serge.po...@gmail.com
on 26 Jul 2010 at 1:26
Attachments:
BTW, can you help me with this
http://groups.google.com/group/scalaris/browse_thread/thread/ea9af755fecc2c39 ?
Original comment by serge.po...@gmail.com
on 26 Jul 2010 at 1:52
Integer division and reminder on base 2 should be fast enough, I don't think we
need this optimization. Additionally, I still would use functions instead of
macros, as no macro is necessary to move the calculation to new() - macros are
'evil'.
Original comment by schin...@gmail.com
on 26 Jul 2010 at 2:04
lock handling, as reported here, is fixed in r917.
Original comment by schin...@gmail.com
on 27 Jul 2010 at 8:22
Original issue reported on code.google.com by
serge.po...@gmail.com
on 17 Jun 2010 at 12:11Attachments: