Open blachris opened 4 years ago
After some trial and error I found out that the problem is probably linked to the invokeOnCompletion handler in bindScopeCancellationToCall in CallExts.kt because the memory leak disappears if it is commented out:
internal fun CoroutineScope.bindScopeCancellationToCall(call: ClientCall<*, *>) {
val job = coroutineContext[Job]
?: error("Unable to bind cancellation to call because scope does not have a job: $this")
//job.invokeOnCompletion { error ->
// if (job.isCancelled) {
// call.cancel(error?.message, error)
// }
//}
}
According to the documentation of invokeOnCompletion:
The resulting DisposableHandle can be used to dispose the registration of this handler and release its memory if its invocation is no longer needed. There is no need to dispose the handler after completion of this job. The references to all the handlers are released when this job completes.
Since the job can last a lot longer than a kroto+ call, I think the invokeOnCompletion-Handler must be disposed by kroto+ after the gRPC call is completed to avoid this memory leak. This handler is used in all gRPC call types in kroto+ and none are cleaned up. @marcoferrer This means this leak could affect all kroto+ call types in any long running coroutine job.
First I want to give a big thanks for opening this issue and providing so much detail.
After looking everything over I think the scope of impact is limited to clients executing many calls in a single long-lived coroutine. Server implementations are affected since each server rpc request creates a new job. The source of this issue, as you stated, is the fact that we do not have a check after call completion to dispose of the invokeOnCompletion
callback. We've always relied on the job completion to handle this. Testing locally I tried wrapping each client call in a launch or async block and confirmed that I wasn't seeing the increase in memory utilization. I think this confirms that once the coroutine job associated with the call is allowed to complete we see everything behave as expected.
It looks like the solution for this will involve updating the client exts to add a call completion callback to that will check if the scopes job has completed yet and dispose of the invokeOnCompletion
when appropriate.
Thoughts?
Thank you @marcoferrer for confirming the issue. My ideas so far:
bindScopeCancellationToCall
returns the DisposableHandle
and we dispose that when the call is finished.invokeOnCompletion
-handler? Because we have the cont.invokeOnCancellation
-handler, maybe we can get away with checking once if the job is already cancelled instead of bindScopeCancellationToCall
.DisposableHandle
into ResponseObserverChannelAdapter
, ClientStreamingCallChannelImpl
and ClientBidiCallChannelImpl
and use onCompleted
and onError
as signals that the call is over and clean up the handle. Problem, the channel impls are created before the handle so it's a bit clumsy.
It looks like there is memory leak in kroto+ when many unary calls are made inside a runBlocking-block, reproduced here: https://github.com/blachris/kroto-plus/commit/223417b1a05294a0387d13f421c0fa8aec4c477d
When you take a heap dump before exiting the runBlocking-block, you will see a number of instances that scale exactly with the number of calls made, for example here after 10000 completed, sequential calls:
It seems there is an unbroken chain of references with
InvokeOnCompletion
through every single call made.After the runBlocking-block, the entire clutter is cleaned up so as a workaround, one can regularly exit and enter runBlocking. However I often see this pattern in main-functions with runBlocking which will reliably encounter this problem over time:
Could you please check if you can avoid this issue in kroto+ or if this is caused by kotlin's runBlocking?