yf0994 / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

A Pool<T> for expensive objects #683

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I've a use-case where where I pool Sockets (SocketChannels) and ByteBuffers and 
I've managed to hack-up a simple Pool / BlockingPool interface for myself with 
an implementation which uses Suppliers to supply objects for a pool on-demand. 
I'd like to propose to add something similar to Guava.

Basically the interface(s) goes like this: http://pastebin.com/QJ6v7MxD

When poll() / take() is called the Pool would use a Supplier to create a new 
object for the pool. A Pool could be initialized like this:

Pools.<O>create(int capacity, Supplier<O> supplier);

`supplier` here could create a new object every time its get() method is called 
(in my case it creates a new unconnected SocketChannel). 

As I searched for `pooling` on SO the only thing I learned was that nobody 
likes to use commons-pool, so I thought maybe Guava would extend its rich set 
of features to cover this, as everybody seems to like Guava. So, is this idea 
any good? (Btw. here: http://pastebin.com/fwYtKNg2 is my implementation, not 
tested extensively.)

Original issue reported on code.google.com by kohanyi....@gmail.com on 10 Aug 2011 at 6:03

GoogleCodeExporter commented 9 years ago
I believe pooling to be a bona fide Hard Problem.  But we have some initial 
thoughts about it which we may get around to posting.

The API should be the easier part. There are two basic approaches,

passive (note, simpler in JDK 7):

  Pool<Expensive> pool = ...
  Lease<Expensive> lease = pool.leaseObject();
  try {
    Expensive o = lease.leasedObject();
    doSomethingWith(o);
    ...
  } finally {
    lease.close();
  }

or active:

  Pool<Expensive> pool = ...
  pool.nameThisMethod(
      new Receiver<Expensive>() {
        public void receive(Expensive o) {
          doSomethingWith(o);
        }
      });

And yes, it'd need something like a Supplier to generate instances as needed. 
There are endless possibilities for how something like might need to be 
configured though.

Original comment by kevinb@google.com on 29 Aug 2011 at 6:31

GoogleCodeExporter commented 9 years ago

Original comment by fry@google.com on 10 Dec 2011 at 4:14

GoogleCodeExporter commented 9 years ago
Any plans or progress for the timeline on this feature?

I am desperately in need for this in guava as currently I am using commons-pool 
which performs really bad.

Original comment by hcey...@batoo.org on 20 Mar 2012 at 2:38

GoogleCodeExporter commented 9 years ago
We have no plans to work on this at this time. Sorry. :-(

Original comment by fry@google.com on 20 Mar 2012 at 2:27

GoogleCodeExporter commented 9 years ago
Here's a question:  there's an occasional use case that is similar to a Pool in 
some ways -- you want pre-created instances on hand ready to hand out, if you 
dip below a certain # in stock you want to create a new batch of them, etc. -- 
but the difference is that users only take() the objects, they don't lease() 
and then return them.

At a glance, that's not a "pool" at all, but something like a Supply<T> or 
Stockpile<T>.  But if the only difference between them is whether you call 
take() or lease(), it doesn't necessarily seem worth forcing them into separate 
utilities.  Any ideas?

Original comment by kevinb@google.com on 30 Mar 2012 at 3:45

GoogleCodeExporter commented 9 years ago
I think that what you've described is just a special kind of pool, which could 
throw when a client returns a leased object.

Pools that "replenish" their resources if their number drops below a limit are 
special pools too, which wouldn't throw if clients try to returns taken 
objects. Instead they would reuse the returned resources (or destroy them when 
they're full).

So, if the actual question was "Should there be separate Pool and Stockpile 
interfaces?" then my opinion is that a single Pool interface could suffice.

Original comment by kohanyi....@gmail.com on 30 Mar 2012 at 4:02

GoogleCodeExporter commented 9 years ago
I am interested in pursuing this project for srs bsns.

Original comment by wasserman.louis on 10 Apr 2012 at 10:40

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago

I just encountered a usecase for this with selenium. I want to have a pool of 
selenium drivers to take from. I'd say +1 for the ACTIVE approach, I'm going to 
be implementing something akin to that internally.

Edit/Re-post: I derp'd and +1'd the wrong approach.

Original comment by emily@soldal.org on 11 May 2012 at 7:45

GoogleCodeExporter commented 9 years ago
Some information about what I ran into when implementing this:

Pools should have a configurable object acquisition policy, be it thread 
blocking or simply spawning a new object when one becomes available or throwing 
an exception and perhaps more that I haven't thought of yet.

Pools should handle GC gracefully, in our implementation when an object is 
being leased, the pool doesn't hold a reference to it, should it be GC'd we 
have a disposer strategy akin to a removal listener from the caching API which 
handles any odd cases.

My Pool interface right now only has:
void lease(Reciever<T>)
void empty()

Original comment by emily@soldal.org on 11 May 2012 at 2:28

GoogleCodeExporter commented 9 years ago
Unlike Emily, I prefer the passive approach as it is more like all we handle 
everyday. The try-with-resource has become quite popular within our team - some 
even chase the old trys to replace them with the new one. Encapsulation of 
bracket squares makes the code less readable and leads to the creation of 
unnecessary temporary objects and the management of final variables, which is 
quite noisy. The passive approach on the contrary uses a known code structure 
that is quite explicit and understandable to everyone, though the structure 
itself needs to be learned by the user of the API.

Original comment by ogregoire on 11 May 2012 at 3:30

GoogleCodeExporter commented 9 years ago
I think that if the structure of active vs. passive needs to be learnt then 
there are more fundamental issues to address.

The active approach means that you have to add a lot of boiler plate to your 
code, if this is something you access regularly then the code can quickly 
become cluttered.

The managment of final variables isn't hard and is only a concern for anon 
classes, its entierly possible to do this without.

Original comment by emily@soldal.org on 14 May 2012 at 10:30

GoogleCodeExporter commented 9 years ago

Original comment by kevinb@google.com on 30 May 2012 at 7:43

GoogleCodeExporter commented 9 years ago
I'd like to withdraw my comment of May 11. We've been experimenting with 
embedded for the last month and we find the active method not as bad as we did. 
We still prefer the passive way, but we don't see any objections to the active 
one anymore.

Original comment by ogregoire on 11 Jul 2012 at 1:28

GoogleCodeExporter commented 9 years ago
After using this kind of pattern for the last few months, we consider that the 
active way is useful when the code doesn't throw any checked exception and the 
passive way is good when the code does actually throw checked exception.

So what we do with our pool implementation is that we offer the two ways. Some 
might say it's redundant, but it leaves the choice to use the best option when 
programmers are faced to various cases. All that for "only" one more method and 
one more interface.

Original comment by ogregoire on 23 Jan 2013 at 10:24

GoogleCodeExporter commented 9 years ago
I am a big fan of the guava libraries.
Since no open source object pool implementation out there is to my liking,
I have been working on implementing a object pool.

Currently it is still in experimental phase.
It supports 2 apis:

Template.doOnPooledObject(new ObjectPool.Handler<PooledObject, SomeException>() 
{
  @Override
  public void handle( PooledObject object) throws SomeException {
    object.doStuff();
  }
}, pool, IOException.class);  // will throw a IOException

or a more "classic" approach:

PooledObject object = pool.borrowObject();
try {
  object.doStuff();
  pool.returnObject(object,null);
} catch (Exception e) {
  pool.returnObject(object,e);
  throw e;
}

A key feature of my API, is to allow the pool user to provide exception 
feedback, to allow the pool to retire defective objects...

If you want to take a look, maybe you find some worthwhile ideas:

http://code.google.com/p/spf4j/source

implementation is in org.spf4j.pool package.

let me know what you think.

cheers.

Original comment by zolyfar...@yahoo.com on 28 Apr 2013 at 3:49

GoogleCodeExporter commented 9 years ago
I guess I'll throw my hat into the ring...

I was asked by the Cassandra folks if I could implement a class mixing CLHM 
(predecessor to Guava's Cache), an object pool, and a multimap. The use-case is 
for maintaining a bound (size, ttl, or idle) on the total number of SSTable 
random access file readers, with the ability to pool multiple readers per 
SSTable. As this could impact latencies, a goal was to make it highly 
concurrent.

The interface is classic and not very interesting.

Internally the resources are denormalized into a cache of synthetic keys to 
resources. A weak value cache of key to transfer queue acts as a view layer to 
manage the available resources that category type. A transfer queue is used to 
provide a fast exchange between producers (release) and consumers (borrow), as 
elimination helps alleviate contention. The resource's synthetic key retains a 
hard reference to its queue, allowing unused queues to be aggressively garbage 
collected by weak references.

Each resource operates within a simple state machine: IDLE, IN_FLIGHT, RETIRED, 
and DEAD. The idle and in-flight states are self explanatory, indicating only 
if the resource is in the transfer queue. The retired state is transitioned to 
when the cache evicts a resource currently being used, thereby requiring the 
release() to transition it to the dead state. The lifecycle listeners allows 
the resource to be reset, closed, etc. as needed.

The time-to-idle is a bit naive, as I didn't want to complicate it early on. A 
secondary cache is used so that the idle time is counted as the time the 
resource is not in-flight. This could be optimized by using the lock 
amortization technique directly and not be bang against the hash table's locks. 
When the idle cache evicts, it transitions the resource to the retired state 
and invalidates it in the primary cache.

This was written over the July 4th holiday for a specific use-case, so I am 
sure there's more that could be flushed out. That also means that while it has 
unit tests, it has not been benchmarked.

https://github.com/ben-manes/multiway-pool

Cheers,
Ben

Original comment by Ben.Manes@gmail.com on 9 Jul 2013 at 9:15

GoogleCodeExporter commented 9 years ago
Switched to an elimination backoff stack. This is probably the best structure 
to design a pool around.

EBS - 28M/s
LTQ - 15M/s
CLQ - 16.5M/s
ABQ - 13.5M/s
LBQ - 9M/s

Original comment by Ben.Manes@gmail.com on 10 Aug 2013 at 6:19

GoogleCodeExporter commented 9 years ago
Hi Ben, 
I wrote a while ago implementation, global_pool <--> thread_local_pool which 
should perform in my opiniion better than EBS in high load cases.

implementtaion is at:

http://code.google.com/p/spf4j/source/browse/#svn%2Ftrunk%2Fspf4j-core%2Fsrc%2Fm
ain%2Fjava%2Forg%2Fspf4j%2Fpool

I have a simple benchmark against apache commons pool (which is not hard to 
beat):

http://code.google.com/p/spf4j/source/browse/trunk/spf4j-core/src/test/java/org/
spf4j/pool/impl/ObjectPoolVsApache.java

Could run your test against this implementation to see how it performans 
against EBS?

Original comment by zolyfar...@yahoo.com on 24 Sep 2013 at 9:17

GoogleCodeExporter commented 9 years ago
Probably what you implemented, and something I thought of after posting the 
above, is a global list of handles that a thread retains a thread-local 
reference to one of. That way a thread will likely claim and release the same 
resource without contending with another thread. The slight complexity is 
stealing idle resources when necessary. If that is what you implemented, I 
agree it should be fundamentally faster than an EBS.

Original comment by Ben.Manes@gmail.com on 25 Sep 2013 at 2:00

GoogleCodeExporter commented 9 years ago
Yup, that is pretty much it. This implementation will "bias" the pooled objects 
to threads and will not steal a object from other threads if another object can 
be created or one is available in the "global" bag. Objects can be unbiased 
from a thread also by a "maintenance" thread.

Original comment by zolyfar...@yahoo.com on 26 Sep 2013 at 11:29

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:15

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:18

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:09