Stream output problem when publishing with `jupyter-book`

kylebarron / stata_kernel

A Jupyter kernel for Stata. Works with Windows, macOS, and Linux.

https://kylebarron.dev/stata_kernel/

GNU General Public License v3.0

266 stars 57 forks source link

Stream output problem when publishing with `jupyter-book` #393

Open roblem opened 3 years ago

roblem commented 3 years ago

There is a jupyter-book issue that is causing problems with jupyter-book publishing when using stata_kernel. Basically, streaming output for a command is split by jupyter-book into multiple output cells per code block, which isn't pretty, as seen in the screenshot of the compiled document: stata_kernel_issues

According to this response, it is a matter of switching from stream output to coalescing into text/plain output.

As a side note, Stata 17 with the %%stata magic has similar issues but the undocumented config.set_streaming_output_mode('off') fixes it so it behaves like Python. This does cause output to appear to hang as no intermediate output is printed as the command runs. Rather it is collected until the command is finished. This fixes the problem seen in the picture above.

kylebarron commented 3 years ago

fixes it so it behaves like Python

Python must do something special when it hooks into Jupyter Book. As mentioned in the specific comment you linked to, 99% of the time you don't want to coalesce into a single output block.

With stdout/stderr streams, Jupyter dumps the output periodically, e.g. if you had:
print(1)
sleep(10)
print(2)
Then you want don't want to have to wait 10 seconds to see 1, it will update with 1 then later will add 2.

That's a feature, not a bug, of stata_kernel, because you don't want to wait for the entire Stata block to finish before you get output. I don't see any way to fix this without a significant overhaul to optionally not return a stream.

roblem commented 3 years ago

Totally agree. The only time you want the non-stream output is with certain automated publishing pipelines like jupyter-book, although in my particular use case the models I am running are computationally trivial so any hangs are not noticeable.

Taking a second look at emacs org mode and how it handles stata output with stata_kernel, it wraps things in an example block which I haven't seen in any other language, so it is coalescing the stream into a block of text. I verified this by inserting and running (in emacs orgmode):

 sysuse auto
 sum
 sleep 10000
 reg price mpg

and no results are returned until the completion of all code in the block and both sets of results are included in the block. So this seems to be more of a client-side issue where clients can "clean up" the stream if need be, as emacs does by constructing an example block of results.

kylebarron commented 3 years ago

I can't remember if the jupyter protocol ends a stream response with a custom message. If so, seems easy to collect a stream into a single block (as long as you assume a continuous stream is intended to go into a single block). Otherwise seems impossible to know how long to wait on the stream.

roblem commented 3 years ago

I played around a bit more and noticed that in Emacs, the emacs-jupyter package (the middleware that allows emacs orgmode to communicate to the jupyter kernel) coalesces Python streams the same way so that all output for a code block is placed in a single output block (unless you use the :async option, then it dumps results out as they come in), so I'll poke around there and see if I can find some clues as to how it knows when execution is done.