Buffer offline messages configuration in RemoteMappingKernelManager

jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.

Other

623 stars 222 forks source link

Description

RemoteMappingKernelManager has a buffer_offline_messages option, which has been introduced in v2.1.0 of EG. I wanted to know how these messages are buffered, and whether it would build up the memory consumption of EG if user's frontend is connected and there is too much output. I could not find the code which is implementing this functionality. Could you point me to the same?

Also, was this feature present in the previous versions of EG? (1.x, 2.0)

Exposure of buffer_offline_messages is not new, but a side-effect of removing the dependency on JKG where configuration is now properly exposed. This option is available in prior releases via --MappingKernelManager.buffer_offline_messages.

Personally, I don't think the message buffering works correctly anyway. Its implemented in Notebook's MappingKernelManager from which RemoteMappingKernelManager is derived.

Looks like the kernel buffers are just a parallel map, indexed by kernel_id, so the messages are just stored in memory (which is another issue).

If necessary, we could extend RemoteMappingKernelManager to override these methods and role our own. If it works nicely, we could contribute it back or leave it as an "enterprise" capability - but there's probably a fair amount of work here.

I also ran into issues with how the buffering works in the first place because a new connection ends up with a different identifier - IIRC - so "finding" the buffered messages for replay didn't seem to work. In addition, EG being "remote" from the notebook server also exacerbates complexity as well - although I found the same issues exist within a standard notebook server environment.

Ah, now that I recall, I think the "buffering" was meant to address intermittent "glitches" and not explicitly disconnection scenarios, which is what folks really want - and its the latter that automatically triggers a different identifier (or something like that).

Message buffering seems to work in some of the cases (maybe in all those cases where the session_key remains unchanged). I tried executing a long running code (with continuous output), disconnect from the remote kernel for a while and then connect back to it.

import time
for i in range(100):
    print(i)
    time.sleep(1)

The 'buffered' messages were relayed.

When I repeated the same experiment by connecting back to the kernel using Reconnect to Kernel from the commands palette, it didn't relay the buffered messages. My guess is that in second case, the session_key changes.

The buffered messages are stored in memory with no size limit, which is a concern. I have seen some EG instances buffing up with memory, which I had thought it was due to some memory leak. Those EG instances had to be restarted.

So in case of EG where we would have long running codes and disconnects from the client, I feel it's desirable to have this feature but there should be a limit on the size of the buffer. I would like to understand why we need the session_key as an identifier for the buffer, when kernel_id could be sufficient

jupyter-server / enterprise_gateway

Buffer offline messages configuration in RemoteMappingKernelManager #783

Description

Environment