MJ-17 ⁃ Logging is being output to cell results stdout, instead of to stdout of application

stainlessbot commented 4 years ago

It makes sense why this is happening, but it didn't use to happen. Figure out why it used to work and then make corrections to make it work again.

┆Issue Type: Bug ┆Priority: High ┆Status: Resolved ┆Fix Versions: v0.2.1 ┆Issue Number: MJ-17 ┆Epic: Kernel

stainlessbot commented 4 years ago

➤ Joshua Carter commented:

This issue only happens when using a native SLF4J implementation, not any of the bindings so:

"ch.qos.logback:logback-classic:1.2.3"Doesn't work"org.slf4j:slf4j-log4j12:1.7.29"Works"org.slf4j:slf4j-jdk14:1.7.29"Works"org.slf4j:slf4j-simple:1.7.29"Doesn't work"org.slf4j:slf4j-jcl:1.7.29"Works

stainlessbot commented 4 years ago

➤ Joshua Carter commented:

❎ Going to wait for help from either OCI or [~accountid:557058:b2dfc1bc-0918-4d5d-9d58-c159e17a5d18]

stainlessbot commented 4 years ago

➤ Joshua Carter commented:

(+) Going to de-prioritize this issue. Println is behaving exactly the way I would expect either. So it looks like I don't fully understand how System.setOut() works. I need to decide what exactly the desired funtionality is here (see debug_logging branch.

stainlessbot commented 4 years ago

➤ Joshua Carter commented:

The behavior as I currently understand it is this:

Any output that comes before the execution of the cell in the callstack goes to the application's stdout, but any output in or after the execution of the cell in the callstack goes to the script's stdout.

I don't know how this is happening; it is possible that there is custom functionality in the beaker x's configured output stream that accomplishes this.

I have decided that this is the functionality that should exist:

Output should behave the way it is now with the following exception: Any output that comes from a detected logger should always go to the application's stdout, never the script's.

I'm not sure how to achieve this yet; however, it is possible that the same methods that Beaker X is using to achieve the current behavior can also be used to achieve this (i.e. figure out where in the call stack the output is coming from).

stainlessbot commented 4 years ago

➤ Joshua Carter commented:

The way that BeakerX accomplishes the current functionality is actually by thread, not the call stack. Before the script is executed, the current thread group (that was just created for the execution), is stored in the beakerx in/out stream, when the stream receives something to output, it checks to see whether or not the current thread group was stored or not (i.e. is the current thread group the thread group of the script), if the call to err/out comes from within the script thread group, then the output goes to Jupyter, otherwise it goes to the original System.out.

stainlessbot commented 4 years ago

➤ Joshua Carter commented:

We've come up with two strategies for accomplishing the desired behavior.

The first is to interrogate the call stack and search for a logging library that might have made the call to System.out or System.err. This has the advantage of being easily scaled to any number of different logging frameworks. It is also shouldn't be much work to implement. It has the drawback of potential performance impacts. The fastest method of interrogating the call stack (SecurityManager) was last clocked at taking 565 ms per 1,000,000 calls: https://stackoverflow.com/questions/421280/how-do-i-find-the-caller-of-a-method-using-stacktrace-or-reflection.

The second is to write customizations for each framework we want to target. For example, if log4j provides a way to intercept all calls made through log4j, then an interceptor could be written that calls a special "logging" method on the custom output stream. This has the drawback of only working for logging frameworks that provide some sort of way to intercept all logging calls made through that framework. Additionally, it will require a significant amount of effort to detect each framework (not easily scalable). This has the advantage of likely having less of a performance impact than the first strategy.

I'm going to go with the first strategy. I will invite users to submit issues with alternative methods in the code documentation. Additionally, I will make this behavior configurable so that it can be switched off via config (if the performance hit is too great for some users).

stainlessbot commented 4 years ago

➤ Joshua Carter commented:

Create a custom implementation of BeakerStdInOutErrHandler. Use this in the KernelManager. Extend the GroovyCodeRunner with a MicronautCodeRunner. Overwrite the call() method with a method that creates a new stdin object (see ConfigurationFactory), gets the stdout and stderr objects from the seo, and sets all three of them on the custom in/out/err handler instance. After this, it should call the parent's call() method. It should wrap both of these in a try/finally block that clears the output handlers on the custom in/out/err handler instance (see GroovyCodeRunner).

stainlessbot commented 4 years ago

➤ Joshua Carter commented:

(-) When a large number of kernels are started at once, some of them won't correctly direct the logging output to the original system log.

I suspect this is caused by the fact that a separate StandardStreamHandler instance is created for every single kernel, but the handler uses System.setOut(), which is global (ish.. but not thread specific). This would mean that when a second kernel thread is created, the 2nd stream handler will store the 1st handler as the original in/out/err stream (only the 1st handler will actually have the original system in/out/err streams stored). This means that if say 20 kernels were created, the 20th kernel would have to pipe its stdout through 20 different handlers before it eventually reached the original system streams.

In the case of redirecting logging, only the most recent 5 notebooks will successfully redirect their logging to the original system out. This is likely because the logging detector only searches up the stacktrace 50 items, and the original logging call gets pushed up past 50 by all of the duplicate stream handlers that it has to get piped through.

Logging issues aside, given that we were concerned about the small performance impact of searching the stacktrace, it is obvious that we need to make it so that there are no duplicate handlers. Since each thread group gets its own set of in/out/err streams, it should work to have a single StandardStreamHandler instance that works for all of the kernel threads.

stainlessai / micronaut-jupyter

MJ-17 ⁃ Logging is being output to cell results stdout, instead of to stdout of application #6