vert-x3 / issues

Apache License 2.0
36 stars 7 forks source link

Timed out locks are not evicted immediately #537

Closed jkurzanski closed 3 years ago

jkurzanski commented 3 years ago

Hi, When a call to vertx().sharedData().getLocalLockWithTimeout finishes with a time out, this lock is not removed from LocalAsyncLocks.waitersMap. In the handler we doesn't receive a reference of this lock to release it by ourself.

It has been reproduced on Vertx 3.8.5.

Here is a small unit test to reproduce it.

@RunWith(VertxUnitRunner.class)
public class LockTest {
    @org.junit.Rule
    final public RunTestOnContext r = new RunTestOnContext();

    @Test
    public void lockLeak(TestContext context) {
        final Async async = context.async();
        r.vertx().sharedData().getLocalLockWithTimeout("test", 50000L, lockAsyncResult1 -> {
            if (lockAsyncResult1.succeeded()) {
                AtomicInteger failedLock=new AtomicInteger(0);
                AtomicInteger lockOk=new AtomicInteger(0);
                for (int i=0;i<50000;i++ ){
                    r.vertx().sharedData().getLockWithTimeout("test", 1L, lockAsyncResult2 -> {
                        if (lockAsyncResult2.succeeded()) {
                            lockOk.incrementAndGet();
                            lockAsyncResult2.result().release();
                        } else {

                            failedLock.incrementAndGet();
                        }
                        int counter=failedLock.get()+lockOk.get();
                        if (counter%1000==0) {
                            System.out.println(String.format("Iteration :%d Failed lock %d, Lock Ok: %d",counter,failedLock.get(),lockOk.get()));
                        }
                        if (failedLock.get()+lockOk.get()==50000) {
                            r.vertx().setTimer(5000L,v ->{
                                //at this point there are 50K locks in the LocalAsyncLocks.waitersMap
                                lockAsyncResult1.result().release();
                                async.complete();
                            });
                        }
                    });
                }
            }
        });

    }
}

Regards

vietj commented 3 years ago

@tsegismont does it ring a bell ?

tsegismont commented 3 years ago

@jkurzanski the waiters list should be cleared after the lock is released. If it isn't there's a bug but I can see that in the code.

What's your use case for having that much concurrent lock acquisitions?

jkurzanski commented 3 years ago

Hi @tsegismont, the unit test is exaggerated to reproduce the issue. We use the getLocalLockWithTimeout to synchronise an access to a ressource, by multiple concurrent http requests. It's fine, if the lock acquisition fails. We handle it gracefully. But as the map is not cleaned, that lead to a memory leak which force us to restart the JVM after few weeks in production.

tsegismont commented 3 years ago

@jkurzanski can you make sure the lock is released after it's been acquired? Otherwise indeed the list of waiters is not cleaned.

jkurzanski commented 3 years ago

That's the issue : if getLockWithTimeout fails, I don't receive any reference to the lock so I can't release it by myself.

tsegismont commented 3 years ago

I meant can you make sure the lock that was successfully acquired is released? The next waiters aren't processed (and thus removed) until then.

I'm asking because in the snippet above the comment says there are 50K locks in the map before releasing the map. They shouldn't be there after the lock is released.

jkurzanski commented 3 years ago
image

When I inspect the waitersMap after after having released the first lock, the 50k subsequent failed locks are still here.

vietj commented 3 years ago

do you have reproducer code @jkurzanski ?

vietj commented 3 years ago

there is one actually here :-)

vietj commented 3 years ago

I created a patch for master to eagerly remove timed out lock from the waiters list instead of waiting for the lock owners to subsequently release the locks, can you have a try @jkurzanski ? https://github.com/eclipse-vertx/vert.x/issues/3570

vietj commented 3 years ago

@jkurzanski this has been fixed, this is not really a leak per se because lock will be garbaged when the lock makes progress, so I will rename the issue to more accurately describe the issue