uwiger / locks

A scalable, deadlock-resolving resource locker
Mozilla Public License 2.0
204 stars 26 forks source link

Application *locks* has stopped on double write-lock #19

Closed ddosia closed 9 years ago

ddosia commented 9 years ago

I have two actors which works approximately in the same time. Each of them begins transaction. Each of them acquires read lock on the same oid(). Then first tries to upgrade read lock to write lock. Second does the same and application crashes immediately:

Logs of the first actor:

Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.10.4  (abort with ^G)
(n1@dch-mbp)1> application:ensure_all_started(locks).
{ok,[locks]}
(n1@dch-mbp)2> {Agent, TrRes} = locks:begin_transaction().
{<0.46.0>,{ok,[]}}
(n1@dch-mbp)3> locks:lock(Agent, [table], read).
{ok,[]}
(n1@dch-mbp)4> locks:lock(Agent, [table], write).
=ERROR REPORT==== 21-Oct-2015::14:45:19 ===                                                                                                                                                                [20/376]
** Generic server locks_server terminating 
** Last message in was {'$gen_cast',{surrender,[table],<0.55.0>}}
** When Server state == {st,{locks_server_locks,locks_server_agents},
                            {dict,2,16,16,8,80,48,
                                  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                   [],[]},
                                  {{[],[],[],[],[],[],[],
                                    [[<0.55.0>|#Ref<0.0.0.76>]],
                                    [],[],[],[],[],[],
                                    [[<0.46.0>|#Ref<0.0.0.69>]],
                                    []}}},
                            <0.44.0>}
** Reason for termination == 
** {function_clause,[{locks_server,queue_entries_,
                                   [[{entry,<0.55.0>,<0.53.0>,4,direct}]],
                                   [{file,"src/locks_server.erl"},{line,211}]},
                     {locks_server,queue_entries_,1,
                                   [{file,"src/locks_server.erl"},{line,214}]},
                     {locks_server,queue_entries_,1,
                                   [{file,"src/locks_server.erl"},{line,214}]},
                     {locks_server,queue_entries_,1,
                                   [{file,"src/locks_server.erl"},{line,212}]},
                     {locks_server,queue_entries,1,
                                   [{file,"src/locks_server.erl"},{line,207}]},
                     {locks_server,notify,3,
                                   [{file,"src/locks_server.erl"},{line,193}]},
                     {locks_server,handle_cast,2,
                                   [{file,"src/locks_server.erl"},{line,142}]},
                     {gen_server,handle_msg,5,
                                 [{file,"gen_server.erl"},{line,604}]}]}

=INFO REPORT==== 21-Oct-2015::14:45:19 ===
    application: locks
    exited: shutdown
    type: temporary
** exception error: {cannot_lock_objects,[{req,[table],
                                               read,
                                               ['n1@dch-mbp'],
                                               0,all},
                                          {req,[table],write,['n1@dch-mbp'],1,all}]}
     in function  locks_agent:await_reply/1 (src/locks_agent.erl, line 397)
     in call from locks_agent:lock_/6 (src/locks_agent.erl, line 380)
(n1@dch-mbp)5> application:which_applications().
[{stdlib,"ERTS  CXC 138 10","1.19.4"},
 {kernel,"ERTS  CXC 138 10","2.16.4"}]

Logs of the second actor:

Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.10.4  (abort with ^G)
(n2@dch-mbp)1> 
User switch command
 --> r 'n1@dch-mbp'
 --> c
Eshell V5.10.4  (abort with ^G)
(n1@dch-mbp)1> {Agent, TrRes} = locks:begin_transaction().
{<0.55.0>,{ok,[]}}
(n1@dch-mbp)2> locks:lock(Agent, [table], read).
{ok,[]}
(n1@dch-mbp)3> locks:lock(Agent, [table], write).
** exception error: {cannot_lock_objects,[{req,[table],
                                               read,
                                               ['n1@dch-mbp'],
                                               0,all},
                                          {req,[table],write,['n1@dch-mbp'],1,all}]}
     in function  locks_agent:await_reply/1 (src/locks_agent.erl, line 397)
     in call from locks_agent:lock_/6 (src/locks_agent.erl, line 380)

I am new to locks so I am trying to learn how it works. In some sense I need lock upgrade functionality, that is why I was curious how it works. Maybe I miss something and what I did goes against very basics of what locks should do.

uwiger commented 9 years ago

Could you try the PR above ( #20 )? I added a test case, which did fail before this fix.

ddosia commented 9 years ago

It doesn't crash any more, but it hangs forever on both sides when I am trying to acquire write lock. My naive understanding of upgrade lock is like that: both acquires read lock, first tries to upgrade to write lock and this implies that it releases read lock and stands in the end of the queue, second does the same and should release read lock and step after first one. Maybe I misunderstand how lock upgrade works? Why then deadlock detection mechanism doesn't prevent me from this?

uwiger commented 9 years ago

It's not a question of the deadlock resolution algorithm, but rather of the lock upgrade semantics. Specifically, the locks_server handles the trivial case of upgrade when there's one read lock, but when there are several read locks, it can't differentiate between agents that want nothing more than a read lock and agents that are holding a read lock but hoping to upgrade.

uwiger commented 9 years ago

I'm at a Halloween party, so probably not sober enough to tackle the issue right now, nor would it likely be socially acceptable. ;-)

If contribs are offered, I'll gratefully review them. Otherwise, I'll take a look at this later.

uwiger commented 9 years ago

Another problem is that the test case needs to verify that the two write lock requests reach different results (currently, they both time out, which is wrong).

uwiger commented 9 years ago

I've pushed some fixes to the uw-lock_upgrade3 branch. They seem to fix the problem.

Could you try to verify at your end?

ddosia commented 9 years ago

It works now, first actor obtains write lock immediately after second tries to acquire write lock. thanks!

uwiger commented 9 years ago

Thanks! I've merged PR #20 into master.