pawamoy / markdown-exec

Utilities to execute code blocks in Markdown files.
https://pawamoy.github.io/markdown-exec
ISC License
111 stars 8 forks source link

stdout from "external" functions not shown #21

Open cbyrohl opened 1 year ago

cbyrohl commented 1 year ago

Describe the bug

Stdout created in functions not defined in the code block itself will not be rendered (but shows up in the stdout of the mkdocs run)

To Reproduce

I expect the same rendered output in all three code blocks below:

print("hello")
def hello():
    print("hello")
hello()
from somepackage import hello   # same definition of hello as before
hello()

The last block executes, and its output shows up in the mkdocs stdout, but the "result" block created by markdown-exec remains empty.

Expected behavior

All three blocks should give the same result.

System (please complete the following information): mkdocs 1.4.3 markdown-exec 1.6.0 Platform: linux OS: posix Python: 3.9.16

Boost priority

Fund with Polar

pawamoy commented 1 year ago

Thanks for the report.

Explanation: when running Python code, we patch the print function in the execution context's globals. Obviously this only affects the code Markdown-Exec sees, not code from other modules.

Workaround: capture stdout of these functions to re-print it from the code executed by Markdown-Exec. Ideally, if you're the one maintaining these functions that print text, you should change them so that they return text instead of printing it.

Permanent fix: we might need to change our strategy, for example by using failprint which will take care of patching everything we need, everywhere so that any output (using print, or sys.stdout, or sys.stdout.buffer, or subprocesses, etc.) will be captured correctly. This is a bit heavier than what we have now, but I don't see another solution using the current approach.

cbyrohl commented 1 year ago

Thanks a lot for your fast response and clarification. I am curious whether there's a quick workaround that does not involve changing the markdown code blocks.

As I understand now, we are talking about this line:

    buffer = StringIO()
    exec_globals["print"] = partial(_buffer_print, buffer)

    try:
        compiled = compile(code, filename=code_block_id, mode="exec")
        exec(compiled, exec_globals)  # noqa: S102

Do you think something like even more hacky like

    buffer = StringIO()
    builtin_inject = "__builtins__['print'] = partial(_buffer_print, buffer)\n"
    exec_globals["buffer"] = buffer
    exec_globals["_buffer_print"] = _buffer_print
    code = builtin_inject + code

    try:
        compiled = compile(code, filename=code_block_id, mode="exec")
        exec(compiled, exec_globals)  # noqa: S102

could work (have not tried)?

pawamoy commented 1 year ago

Ha, interesting! I didn't think mutating __builtins__ would affect all code and not just the current module. So yes, it's a bit more hacky but is an actual solution (I tried it and it works) :slightly_smiling_face: Thanks! I was going to offer that you contribute the change with a PR, but we need careful thinking: prefixing the user code will change the line numbers, and possibly other things like tracebacks in case of error, etc.. We must make sure the change doesn't negatively impact the current UX.

cbyrohl commented 1 year ago

This approach also does not seem to work nicely with other packages that modify/query print via __builtins__ (e.g. numba)

pawamoy commented 1 year ago

I might have found a lower-level solution that could work for many different scenarios: https://stackoverflow.com/a/22434262/3451029. If we go this route, I'd like to implement that in failprint, and then depend on failprint here.

cbyrohl commented 1 year ago

Looks good, just gave redirect_stdout a try by redirecting to "buffer" in above code snippet. It is not conserving linebreaks right now, but I guess that's a minor aspect. Will you give it a try or should I do a PR (without failprint for now...)?

pawamoy commented 1 year ago

It is not conserving linebreaks right now

Ouch, I'd say this is a no-go and we must fix this. It's very important that user output is 100% untouched.

But, just to make sure (and sorry, this wasn't clear in my previous comment): I was pointing at the low-level solution using file descriptors, called stdout_redirected in the SO answer. Copy-paste for reference:

import os
import sys
from contextlib import contextmanager

def fileno(file_or_fd):
    fd = getattr(file_or_fd, 'fileno', lambda: file_or_fd)()
    if not isinstance(fd, int):
        raise ValueError("Expected a file (`.fileno()`) or a file descriptor")
    return fd

@contextmanager
def stdout_redirected(to=os.devnull, stdout=None):
    if stdout is None:
       stdout = sys.stdout

    stdout_fd = fileno(stdout)
    # copy stdout_fd before it is overwritten
    #NOTE: `copied` is inheritable on Windows when duplicating a standard stream
    with os.fdopen(os.dup(stdout_fd), 'wb') as copied: 
        stdout.flush()  # flush library buffers that dup2 knows nothing about
        try:
            os.dup2(fileno(to), stdout_fd)  # $ exec >&to
        except ValueError:  # filename
            with open(to, 'wb') as to_file:
                os.dup2(to_file.fileno(), stdout_fd)  # $ exec > to
        try:
            yield stdout # allow code to be run with the redirected stdout
        finally:
            # restore stdout to its previous value
            #NOTE: dup2 makes stdout_fd inheritable unconditionally
            stdout.flush()
            os.dup2(copied.fileno(), stdout_fd)  # $ exec >&copied

I'll give it a try myself, no PR for now please :slightly_smiling_face:

pawamoy commented 1 year ago

Hey @cbyrohl, this is moving forward. failprint now captures thing with file descriptors manipulation. So it's now possible to switch to using failprint in markdown-exec. Such a feature will probably land in the $1000/month goal of my insiders program.