puniverse / quasar

Fibers, Channels and Actors for the JVM
http://docs.paralleluniverse.co/quasar/
Other
4.56k stars 575 forks source link

How can I avoid AbstractQueuedSynchronizer$Node and FiberTimedScheduler$ScheduledFutureTask? #224

Open Jire opened 7 years ago

Jire commented 7 years ago

How can I avoid creating these objects? Why are these objects created? They seem to be retained and won't be garbage collected, climbing to 50MB+ of memory usage.

Here's a screenshot of YourKit after a few minutes of my application running: http://i.imgur.com/ppMFmyU.png

pron commented 7 years ago

j.u.c.AbstractQueuedSynchronizer has nothing to do with Quasar, but the fact you're using it may suggest you may be accidentally using some thread synchronization that blocks fibers, which may also explain when FiberTimedScheduler$ScheduledFutureTasks keep piling up.

Jire commented 7 years ago

@pron Our app does not use any true blocking as in synchronizations, but it does constantly do a lot of work across many fibers at once.

Some points of concern:

What do you think?

pron commented 7 years ago

I think you'll need to investigate where those j.u.c.AbstractQueuedSynchronizers are created with a debugger.

pron commented 7 years ago

Hi @Jire . Any news on this?

jonatino commented 7 years ago

@pron We've both tried to take a look at this and still have not had any luck. We call Strand.sleep a lot which seems to be responsible for the abundant amount of ScheduledFutureTask instances (https://dl.dropboxusercontent.com/u/91292881/ShareX/2016/12/javaw_2016-12-30_03-57-34.png)

    public Future<Void> schedule(Fiber<?> fiber, Object blocker, long delay, TimeUnit unit) {
        if (fiber == null || unit == null)
            throw new NullPointerException();
        assert fiber.getScheduler() == scheduler;
        ScheduledFutureTask t = new ScheduledFutureTask(fiber, blocker, triggerTime(delay, unit));
        delayedExecute(t);
        return t;
    }

In regards to the AbstractQueuedSyncronizer, could it also be caused by the amount of Strand.sleeps we use?

Here is a small example which will reproduce the same results after 5-10 minutes of running.

import co.paralleluniverse.kotlin.fiber
import co.paralleluniverse.strands.Strand
import java.util.concurrent.ThreadLocalRandom
import java.util.concurrent.TimeUnit

fun main(args: Array<String>) {
    System.setProperty("co.paralleluniverse.fibers.detectRunawayFibers", "false")
    System.setProperty("co.paralleluniverse.fibers.verifyInstrumentation", "false")
    System.setProperty("co.paralleluniverse.fibers.DefaultFiberPool.parallelism", "1")

    every(8) {
        val r = nextInt(0,40)
        if (r == 10) {
            Strand.sleep((20 + nextInt(0, 200)).toLong())
        }
    }

    Strand.sleep(Long.MAX_VALUE) // prevent exit
}

inline fun every(duration: Int, crossinline body: () -> Unit) = fiber {
    while (!Strand.interrupted()) {
        body()
        Strand.sleep(duration.toLong(), TimeUnit.MILLISECONDS)
    }
}

fun nextInt(min:Int, max:Int) = ThreadLocalRandom.current().nextInt(min,max)

Screenshot: Screenshot

Screenshot

This is just one example of a typical Strand we have in our project. When you multiply those results by 10 or 20 you can see the issue we are having 😄

pron commented 7 years ago

Can you try running with -Dco.paralleluniverse.fibers.useLockFreeDelayQueue or -Dco.paralleluniverse.fibers.useLockFreeDelayQueue=true?

The ScheduledFutureTask will still be allocated, but I believe it will stop allocating j.u.c.AbstractQueuedSynchronizer$Nodes.

jonatino commented 7 years ago

That did get rid of the j..u.c.AbstractQueuedSynchronizer$Node but started allocating co.paralleluniverse.concurrent.util.ConcurrentSkipListPriorityQueue$Nodes and co.paralleluniverse.concurrent.util.ConcurrentSkipListPriorityQueue$Indexs

Screen

Scr

In case you need YourKit, they would be more than happy to offer you an opensource license. Their only requirement is a mention somewhere on the project (for example ours it at the end of our README https://github.com/Jire/Acelta). https://www.yourkit.com/purchase/#os_license

pron commented 7 years ago

Well, at least I figured out where those nodes were coming from.

Now, whenever a thread (or a fiber -- same thing) blocks, something must be allocated, whether it's a node in a waiter's list on a lock, or, as in this case, some node in a scheduled waiting list. There's no getting around that. I suppose we could use a different data structure, like an array list, to hold the records -- which would be very unorthodox -- but as the list must be sorted, that would mean constantly searching it. Anyway, in all languages and implementations I know, blocking any kind of thread entails an allocation. Those records must be maintained until the thread is unblocked. Do you have any indication that they are preserved beyond that?

pron commented 7 years ago

Actually, after some more thought, I think we can significantly reduce that allocation, but it will take a bit of work, and I want to make sure that you are actually experiencing adverse GC effects because of this. Anyway, when I said "there's no getting around that", I was wrong. We can get around that.

jonatino commented 7 years ago

@pron has this issue been brought up before? Our software cycles within 0-1ms so it's crucial to have zero garbage created to prevent any additional latency from the GC cleaning up (The only garbage we have are the aforementioned two allocations from quasar). Now whether or not it's worth putting in the work to fix this, that's totally up to you. IMO, if you can think of a way to significantly reduce the allocations without a performance impact, I don't see why not. 😄