uwiger / gproc

Extended process registry for Erlang
Apache License 2.0
1.07k stars 232 forks source link

gproc_pool using claim strategy appears to be broken #85

Closed djnym closed 3 years ago

djnym commented 9 years ago

Not sure if there's something I'm missing, but here's the output with all the commands and info (hopefully).

anthony.molinaro@nymwork:5> git clone git@github.com:uwiger/gproc.git Cloning into 'gproc'... remote: Counting objects: 1658, done. remote: Total 1658 (delta 0), reused 0 (delta 0), pack-reused 1658 Receiving objects: 100% (1658/1658), 1.90 MiB | 81.00 KiB/s, done. Resolving deltas: 100% (1038/1038), done. Checking connectivity... done. anthony.molinaro@nymwork:6> cd gproc anthony.molinaro@nymwork:7> git checkout 0.3.1 Note: checking out '0.3.1'.

You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example:

git checkout -b new_branch_name

HEAD is now at 54a3b20... Merge pull request #67 from sebmaynard/master anthony.molinaro@nymwork:9> make /usr/local/bin/rebar get-deps ==> gproc (get-deps) /usr/local/bin/rebar compile ==> gproc (compile) Compiled src/gproc_pt.erl Compiled src/gproc_sup.erl Compiled src/gproc_ps.erl Compiled src/gproc_monitor.erl Compiled src/gproc_init.erl Compiled src/gproc_info.erl Compiled src/gproc_lib.erl Compiled src/gproc_bcast.erl Compiled src/gproc_pool.erl Compiled src/gproc_app.erl src/gproc_dist.erl:23: Warning: behaviour gen_leader undefined Compiled src/gproc_dist.erl Compiled src/gproc.erl anthony.molinaro@nymwork:11> erl -pa ebin Erlang/OTP 17 [erts-6.2] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [ kernel-poll:false] [dtrace]

Eshell V6.2 (abort with ^G) 1> application:start(gproc). ok 2> gproc_pool:ptest(10, 100, claim, []). add_worker(897, a) -> 1; Ws = [{a,1}] add_worker(897, b) -> 2; Ws = [{a,1},{b,2}] add_worker(897, c) -> 3; Ws = [{a,1},{b,2},{c,3}] add_worker(897, d) -> 4; Ws = [{a,1},{b,2},{c,3},{d,4}] add_worker(897, e) -> 5; Ws = [{a,1},{b,2},{c,3},{d,4},{e,5}] add_worker(897, f) -> 6; Ws = [{a,1},{b,2},{c,3},{d,4},{e,5},{f,6}] worker stats (897): [{a,1},{b,1},{c,1},{d,1},{e,1},{f,1}]

=ERROR REPORT==== 12-Mar-2015::15:37:43 === Error in process <0.49.0> with exit value: {{badmatch,{14,false}},[{gproc_pool,t est_run2,5,[{file,"src/gproc_pool.erl"},{line,984}]},{timer,tc,3,[{file,"timer.e rl"},{line,194}]},{gproc_pool,'-ptest/4-fun-0-',2,[{file,"src/gproc_pool.erl"},{ line,901}]}]}

=ERROR REPORT==== 12-Mar-2015::15:37:43 === Error in process <0.50.0> with exit value: {{badmatch,{11,false}},[{gproc_pool,t est_run2,5,[{file,"src/gproc_pool.erl"},{line,984}]},{timer,tc,3,[{file,"timer.e rl"},{line,194}]},{gproc_pool,'-ptest/4-fun-0-',2,[{file,"src/gproc_pool.erl"},{ line,901}]}]} ... =ERROR REPORT==== 12-Mar-2015::15:37:43 === Error in process <0.48.0> with exit value: {badarg,[{gproc,set_value,[{n,l,[gpro c_pool,897,5,e]},0],[{file,"src/gproc.erl"},{line,1313}]},{gproc_pool,try_claim, 3,[{file,"src/gproc_pool.erl"},{line,461}]},{gprocpool,claim,2,[{file,"src/gpr oc_pool.erl"},{line,434}]},{timer,tc,3,[{file...

\ exception error: no function clause matching gproc_pool:'-collect/1-fun-0-'({{badmatch,{11,false}}, [{gproc_pool,test_run2,5, [{file, "src/gproc_pool.erl"}, {line,984}]}, {timer,tc,3, [{file,"timer.erl"}, {line,194}]}, {gproc_pool, '-ptest/4-fun-0-',2, [{file, "src/gproc_pool.erl"}, {line,901}]}]}, {[],[]}) (src/gproc_pool.erl, line 913) in function lists:foldr/3 (lists.erl, line 1274) in call from lists:foldr/3 (lists.erl, line 1274) in call from gproc_pool:collect/1 (src/gproc_pool.erl, line 913) in call from gproc_pool:ptest/4 (src/gproc_pool.erl, line 903) 3>

uwiger commented 9 years ago

If I understand it correctly, what happens is that some claim attempts in the test run end up returning false, since the pool is full. The test pool is of size 6, and the number of workers is 10, so one should perhaps not be too surprised.

Obviously, the test needs to be fixed. The question is how? Right now, if the pool is full, claim/2 simply returns false, and the test code is not prepared to deal with that. The claim function itself doesn't offer any kind of retry or wait options. Perhaps it should?

djnym commented 9 years ago

Retry logic is usually left up to the caller (at least it is in most other pooling implementations I've been playing with for my Erlang Factory talk on pools), in that they return false (or busy or something like that), and the caller then decides what it wants to do. But several also support a queue of incoming requests and will block if all resources are used. So you sort of have the option, you could support a "blocking" mode which waits (maybe with a timeout, although many do not support a timeout), and a "non-blocking" mode which works how it does now.

However, I'm not sure that's the issue, I tried this

 case gproc_pool:claim (?POOL_ID,
                        fun (K,Pid) ->
                          io:format ("have ~p : ~p do ~p~n",[K,Pid,N]),
                          pt_gproc_worker:do (Pid, N, Data)
                        end) of
   {true, Res} -> Res;
   false -> {error, busy}
 end.

However, this prints

have {n,l,[gproc_pool,pt_gproc_pool,1,{pt_gproc_pool,1}]} : 1 do 5

Shouldn't the 1 actually be a pid as per the documentation?

I tried modifying the gproc:select here https://github.com/uwiger/gproc/blob/master/src/gproc_pool.erl#L431 to

    case gproc:select({l,n}, [{ {{n,l,[?MODULE,Pool,'_','_']}, '$1', 0}, [],
                                [{{ {element,1,'$_'}, '$1' }}]}], 1) of

but that only seemed to help slightly in that it still had the same error

** exception error: bad argument
     in function  gproc:set_value/2
        called as gproc:set_value({n,l,
                                     [gproc_pool,pt_gproc_pool,5,
                                      {pt_gproc_pool,5}]},
                                  0)
     in call from gproc_pool:try_claim/3 (src/gproc_pool.erl, line 462)
     in call from gproc_pool:claim_/2 (src/gproc_pool.erl, line 434)
     in call from pt_gproc_sup:do/2 (src/pt_gproc_sup.erl, line 20)

Looking at the code (and adding a bunch of io:format/2 calls), I think the issue might be in how gproc:set_value is determining the pid and then passing it to the gproc_lib:do_set_value/3. If I'm reading these correctly, it seems like gproc:set_value is a gen_server call, and it takes the Pid from the From part of the call which means its the pid of the caller, then that gets passed and deep in gproc_lib:do_set_value the current value is looked up in the gproc ets table and if it matches things work, otherwise it's a badarg. Since the pid in the ets table is the pid of the worker process it will never match the caller and so any call to claim would fail, unless it's from within the worker itself. That seems sort of wrong as all the other gproc_pool methods are from the process attempting to get in contact with a worker. So I'm either using it wrong or there's a crossed wire in there.

Hopefully that makes sense.

uwiger commented 9 years ago

Thanks for the digging. I've found a few issues, and am getting closer to a working test. I'll publish a branch as soon as I can.

I really appreciate your effort. As (I really hope) I've mentioned before, gproc_pool was an experiment. You are really making critical contributions to making it product-worthy (which I hope it can be).

uwiger commented 9 years ago

I have fixed claim/2 so that it now passes the pid, as the docs say.

djnym commented 9 years ago

Great, this now works as I would expect. I assume you'll want to do more testing for completeness but it's at least far enough I can now use it in my Erlang Factory talk (I'm doing a talk looking at different pooling techniques and libraries to try to provide some guidance for beginner/intermediate developers, gproc_pool is on the more complex end simply because it doesn't manage your processes like say poolboy or pooler, but is more of a way to pick the processes, management is left up to you). Anyway, claim/2 is closest to the checkin/checkout semantics that other pooling libraries use, so is best for comparison.

uwiger commented 9 years ago

There certainly should be more tests, but I think I should go ahead and merge the changes, since it obviously didn't work right before. :)

One can of course argue the point of gproc even attempting this kind of functionality. I guess the underlying idea is one of gproc as a method of organizing your system, publishing 'names' and 'properties' in an index of process footprints and dependencies. Essentially, once accustomed to debugging systems via gproc, a point of using gproc_pool would be that the pool properties adhere to the same inspection rules.

Feel free to challenge the relevance of that argument to process pools. ;-)