Potential memory leak with each call inside runBlocking

marcoferrer / kroto-plus

gRPC Kotlin Coroutines, Protobuf DSL, Scripting for Protoc

Apache License 2.0

493 stars 28 forks source link

Potential memory leak with each call inside runBlocking #114

Open blachris opened 4 years ago

blachris commented 4 years ago

It looks like there is memory leak in kroto+ when many unary calls are made inside a runBlocking-block, reproduced here: https://github.com/blachris/kroto-plus/commit/223417b1a05294a0387d13f421c0fa8aec4c477d

When you take a heap dump before exiting the runBlocking-block, you will see a number of instances that scale exactly with the number of calls made, for example here after 10000 completed, sequential calls: 10000calls_heapdump

It seems there is an unbroken chain of references with InvokeOnCompletion through every single call made.

After the runBlocking-block, the entire clutter is cleaned up so as a workaround, one can regularly exit and enter runBlocking. However I often see this pattern in main-functions with runBlocking which will reliably encounter this problem over time:

fun main(args: Array<String>) = runBlocking {
  while (true) {
    // make unary calls with kroto+ here and run out of memory eventually
    delay(50)
  }
}

Could you please check if you can avoid this issue in kroto+ or if this is caused by kotlin's runBlocking?

blachris commented 4 years ago

After some trial and error I found out that the problem is probably linked to the invokeOnCompletion handler in bindScopeCancellationToCall in CallExts.kt because the memory leak disappears if it is commented out:

internal fun CoroutineScope.bindScopeCancellationToCall(call: ClientCall<*, *>) {
    val job = coroutineContext[Job]
        ?: error("Unable to bind cancellation to call because scope does not have a job: $this")
    //job.invokeOnCompletion { error ->
    //    if (job.isCancelled) {
    //        call.cancel(error?.message, error)
    //    }
    //}
}

blachris commented 4 years ago

According to the documentation of invokeOnCompletion:

The resulting DisposableHandle can be used to dispose the registration of this handler and release its memory if its invocation is no longer needed. There is no need to dispose the handler after completion of this job. The references to all the handlers are released when this job completes.

Since the job can last a lot longer than a kroto+ call, I think the invokeOnCompletion-Handler must be disposed by kroto+ after the gRPC call is completed to avoid this memory leak. This handler is used in all gRPC call types in kroto+ and none are cleaned up. @marcoferrer This means this leak could affect all kroto+ call types in any long running coroutine job.

marcoferrer commented 4 years ago

First I want to give a big thanks for opening this issue and providing so much detail.

After looking everything over I think the scope of impact is limited to clients executing many calls in a single long-lived coroutine. Server implementations are affected since each server rpc request creates a new job. The source of this issue, as you stated, is the fact that we do not have a check after call completion to dispose of the invokeOnCompletion callback. We've always relied on the job completion to handle this. Testing locally I tried wrapping each client call in a launch or async block and confirmed that I wasn't seeing the increase in memory utilization. I think this confirms that once the coroutine job associated with the call is allowed to complete we see everything behave as expected.

It looks like the solution for this will involve updating the client exts to add a call completion callback to that will check if the scopes job has completed yet and dispose of the invokeOnCompletionwhen appropriate.

Thoughts?

blachris commented 4 years ago

Thank you @marcoferrer for confirming the issue. My ideas so far:

bindScopeCancellationToCall returns the DisposableHandle and we dispose that when the call is finished.
Do unary calls need the invokeOnCompletion-handler? Because we have the cont.invokeOnCancellation-handler, maybe we can get away with checking once if the job is already cancelled instead of bindScopeCancellationToCall.
Pass the DisposableHandle into ResponseObserverChannelAdapter, ClientStreamingCallChannelImpl and ClientBidiCallChannelImpl and use onCompleted and onError as signals that the call is over and clean up the handle. Problem, the channel impls are created before the handle so it's a bit clumsy.