Significant performance issue with large numbers of sets

GoogleCodeExporter commented 9 years ago

What version of the product are you using? On what operating system?

All versions up to and including 2.5rc1

Tell me more...

If a large number of sets are committed to the cache (e.g 20000+) in a loop 
then one thread 
maxes out in the background putting data into the memcached client.  The queue 
is emptied 
VERY slowly (a few 10s of items per second).  A 100,000 item set loop will mean 
many minutes to 
empty the queue.  I have profiled the code and the issue is in the class 
TCPMemcachedNodeImpl.

In the method fillWriteBuffer() the problem is that the readQ is checked to see 
if it contains the 
item about to be written - 

if(!readQ.contains(o)) {
    readQ.add(o);
}

The implementation of LinkedBlockingQueue uses a linear search with the 
expected 
consequences.  There is a comment immediately above this which says - 

    // This isn't the most optimal way to do this, but it hints
    // at a larger design problem that may need to be taken care
    // if in the bowels of the client.
    // In practice, readQ should be small, however.

In practice the readQ isn't small as every item that is set() is put into the 
queue.  This is a major 
problem as it chews up processor time and means that the queue gets overflowed 
in situations 
when it should not.  This is creating a major issue for the project I am 
working on.

I used this code to test this issue - 

public class TestPerformance {
    private MemcachedClient memcache;

    public static void main(String[] args) throws Exception {
        new TestPerformance().run();
    }

    private void run() throws IOException {
        memcache = new MemcachedClient(AddrUtil.getAddresses("localhost:11211"));

        long time1 = System.nanoTime();
        addLotsOfData();

        long time2 = System.nanoTime();
        System.out.println("Time to add data = " + ((time2 - time1) / 1000000.0) + "ms");

        memcache.waitForQueues(86400, TimeUnit.SECONDS);
        long time3 = System.nanoTime();
        System.out.println("Time for queues to drain = " + ((time3 - time2) / 1000000.0) + "ms");

        memcache.shutdown(2000, TimeUnit.MILLISECONDS);
    }

    private void addLotsOfData() {
        for (int i = 0; i < 25000; i++) {
            memcache.set(Integer.toString(i), 86400, "Hello this is a test " + i);
        }
    }
}

On my MacBook pro with 10000 items I see the following results - 

Time to add data = 820.142ms
Time for queues to drain = 2721.893ms

With 25,000 items - 
Time to add data = 1189.688ms
Time for queues to drain = 24770.045ms

With 50,000 items - 

Time to add data = 1473.822ms
Time for queues to drain = 129161.422ms

I wrote a custom version of the LinkedBlockingQueue that uses a Set to track 
the contents of the 
queue and plugged it in via the ConnectionFactory (overriding 
createReadOperationQueue()) the 
problem was solved.  However, I need to productionize this code as it is not 
strictly thread safe 
currently.  This really needs to be fixed in the main line code.

Original issue reported on code.google.com by nickrior...@googlemail.com on 9 Mar 2010 at 10:44

GoogleCodeExporter commented 9 years ago

Original comment by ingen...@gmail.com on 10 Mar 2010 at 4:36

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

I looked at the threading access today around the offending queue and, if I 
have my
analysis right, the readQ is only accessed from the spy thread.  As such a 
simple
temporary workaround of building a wrapper around LinkedBlockingQueue with an
embedded Set seems to work for me at present - not ideal, but a massive 
improvement
on the performance.  I can now add 500,000 items to the set and see that it 
takes
41,300ms to add the data and 10,800ms to drain the queues.  If you're 
interested in
the code I can post (but it's pretty trivial).  I also finally get to see 
memcached
getting exercised (30-70% processor utilization rather than 0.1%).

Original comment by nickrior...@googlemail.com on 10 Mar 2010 at 12:05

GoogleCodeExporter commented 9 years ago

Nick - can you post your bug fix or integrate it into github?

I believe i'm running into this issue in production.

Original comment by spicysq...@gmail.com on 9 Apr 2010 at 4:19

GoogleCodeExporter commented 9 years ago

Hi spicysquid

The way I solved this was to create a wrapper around the LinkedBlockingQueue 
that is
causing the problem.  It contains a HashSet containing all the items in the 
queue -
the contains() method can then use this efficiently.  This works for two 
reasons - 

1. There is very limited threading going on (only one spy thread + your 
application
threads) - and the communications between the threads are quite controlled.

2. Not all methods are used in the LinkedBlockingQueue - and in particular I 
don't
have to implement the semantics of drainTo() (thankfully).

It's a monumental bodge, but it works neatly because you can slot it in via the
existing API by creating your own derivative of the ConnectionFactory and 
overriding
createReadOperationQueue() to return the custom LinkedBlockingQueue.

Here's the wrapped Queue - 

/**
 * This is a temporary solution to the performance issues of the contains method being
 * inefficient with long queues.
 */
public class CustomLinkedBlockingQueue<T> extends LinkedBlockingQueue<T> {
    private Set<T> objectsInQueue = new HashSet<T>();

    public CustomLinkedBlockingQueue() {
        super();
    }

    public CustomLinkedBlockingQueue(int capacity) {
        super(capacity);
    }

    public CustomLinkedBlockingQueue(Collection<T> c) {
        super(c);
    }

    @Override
    public void put(T t) throws InterruptedException {
        super.put(t);
        objectsInQueue.add(t);
    }

    @Override
    public boolean offer(T t, long timeout, TimeUnit unit) throws InterruptedException {
        boolean b = super.offer(t, timeout, unit);
        if (b) {
            objectsInQueue.add(t);
        }
        return b;
    }

    @Override
    public boolean offer(T t) {
        boolean b = super.offer(t);
        if (b) {
            objectsInQueue.add(t);
        }
        return b;
    }

    @Override
    public T take() throws InterruptedException {
        final T o = super.take();
        objectsInQueue.remove(o);
        return o;
    }

    @Override
    public T poll(long timeout, TimeUnit unit) throws InterruptedException {
        final T o = super.poll(timeout, unit);
        if (o != null) {
            objectsInQueue.remove(o);
        }
        return o;
    }

    @Override
    public T poll() {
        final T o = super.poll();
        if (o != null) {
            objectsInQueue.remove(o);
        }
        return o;
    }

    @Override
    public boolean remove(Object o) {
        boolean b = super.remove(o);
        if (b) {
            objectsInQueue.remove(o);
        }
        return b;
    }

    @Override
    public int drainTo(Collection<? super T> c) {
        int count = super.drainTo(c);
        if (count > 0) {
            objectsInQueue.removeAll(c);
        }
        return count;
    }

    @Override
    public int drainTo(Collection<? super T> c, int maxElements) {
        if (c.size() > 0) {
            throw new UnsupportedOperationException("doesn't support this at present");
        }
        int count = super.drainTo(c, maxElements);
        if (count > 0) {
            objectsInQueue.removeAll(c);
        }
        return count;
    }

    @Override
    public T remove() {
        final T o = super.remove();
        objectsInQueue.remove(o);
        return o;
    }

    @Override
    public boolean contains(Object o) {
        return objectsInQueue.contains(o);
    }

    @Override
    public boolean containsAll(Collection<?> c) {
        return objectsInQueue.containsAll(c);
    }
}

then create a new ConnectionFactory by 

public class MyConnectionFactory extends DefaultConnectionFactory {

    // ... constructors omitted for clarity ...

    @Override
    public BlockingQueue<Operation> createReadOperationQueue() {
        return new CustomLinkedBlockingQueue<Operation>();
    }
}

I hope that helps.

- Nick

Original comment by nickrior...@googlemail.com on 9 Apr 2010 at 11:49

GoogleCodeExporter commented 9 years ago

I am using this patch with one small change:
private Set<Operation> objectsInQueue = Collections.newSetFromMap(new 
ConcurrentHashMap<Operation, Boolean>());

not sure if its necessary but I wanted to make sure the HashSet was thread-safe.

-Jonathan

Original comment by jonat...@gmail.com on 15 Oct 2010 at 6:42

GoogleCodeExporter commented 9 years ago

I have just begun working on a fix for this issue. Thanks to everyone for their 
feedback as well as the test to reproduce this issue. I want to provide some 
feedback on some of my findings from playing around with the code and then 
later I will post another update when I have completed my fix.

First off I want to note that while this portion of code can definitely be 
improved and should perform better, but anyone doing a sets at a very high rate 
should still handle timeout issues and check to make sure that their set 
operations are succeeding. I say this because even when I deleted the 
!readQ.contains(o) and just added the operation to the read queue I was still 
able to run into situations where I got large amounts of timeouts from my 
operations. This is because no matter how much better we make the code 
developers will still be io bound and if you write to the queues quicker than 
they can be sent out on the network then operation timeouts will be inevitable.

Original comment by mikewie...@gmail.com on 24 Aug 2011 at 12:11

Changed state: Started
Added labels: Milestone-Release2.7.2

GoogleCodeExporter commented 9 years ago

I just committed a fix for this. It's in review right now. I'll post a jar with 
patch on top of Spy 2.7.1 when it gets through review.

Original comment by mikewie...@gmail.com on 24 Aug 2011 at 10:33

GoogleCodeExporter commented 9 years ago

The fix has made it through review and will be in the Spy 2.8 beta and 2.7.2 
releases of Spymemcached.

Original comment by mikewie...@gmail.com on 5 Sep 2011 at 10:32

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Here's the 2.7.2 jar that contains a fix for this issue. It's the first commits 
on top of 2.7.1 so this is just 2.7.1 + Bug 125.

Original comment by mikewie...@gmail.com on 6 Sep 2011 at 10:49

Attachments:

GoogleCodeExporter commented 9 years ago

Thanks

Original comment by n...@anamatica.com on 7 Sep 2011 at 5:28

GoogleCodeExporter commented 9 years ago

Related to this: http://code.google.com/p/spymemcached/issues/detail?id=316

Original comment by oooctav...@gmail.com on 22 Jan 2015 at 12:10

roc230 / spymemcached

Significant performance issue with large numbers of sets #125