VSCode debugger support: 'debug cell'

rfejgin commented 6 months ago

Description

I am using marimo in VSCode using the extension. With Jupyter Notebooks, I have an option to 'debug' a cell rather than just run it. This will hit VSCode breakpoints in the called code. I like this a lot since lets me fluidly mix between notebook-style execution and IDE-style debugging.

Does that way of working fit the marimo model at all? Or is the idea that if I want to debug I run the whole file directly from the VSCode debugger (without marimo)? If it's the former, it would be nice to have a 'debug cell' option.

Alternative

Document the inteded approach to interactive debugging when one needs to examine code called by a marimo notebook. Searching the docs for 'debug' didn't come up with anything.

Additional context

No response

mscolnick commented 6 months ago

Can you use import pdb; pdb.set_trace() within the cell?

rfejgin commented 6 months ago

Yes. But this breaks into the pdb interface, not the VSCode visual debugger (where you can e.g. click to set a breakpoint). For folks used to using the IDE, switching to pdb is a pretty different way of working (e.g. examining variables also would need to be done via pdb).

Within the notebook itself using pdb isn't too bad, but my use case is one where the notebook calls other code - like my model implementation - and it's in that implementation where I want to set breakpoints (visually, ideally).

mscolnick commented 6 months ago

Got it, yea, the vscode debugger is much better than basic pdb.

It's not super trivial, but I think we would need to be able to run marimo with a debug/inspect flag that will launch debugpy in order to communicate with vscode.

I'm not sure when we will get to i - maybe a contributor can pick it up, or if there are enough 👍 we can try to prioritize it sooner.

rfejgin commented 6 months ago

Cool, thanks for considering!

rfejgin commented 6 months ago

By the way: I tried what I thought might be the alternative approach, which was to directly launch the script (notebook) from VSCode. But it seems that marimo creates a copy of the script somehwere in /tmp so breakpoints set in VSCode in the original script do not get hit.

mscolnick commented 6 months ago

Are you running it as a script (python notebook.py) or as an app (marimo edit notebook.py)? Either way, it's possible that vscode installs debugpy for you and maybe there is a way for us to declare that mapping.

rfejgin commented 6 months ago

I'm running as a script, using VSCode's Run->Start Debugging. From examining the command line that generates, it does indeed appear to be calling debugpy.

mscolnick commented 6 months ago

Got it: 1) I wonder if we can run our scripts without copying files to /tmp (or at least an option to) 2) In the meantime, you could try marimo export notebook.py -o notebook.script.py to export the file to a flat script without marimo's cell decorators, which could be helpful while debugging. marimo export also supports --watch

akshayka commented 6 months ago

I wonder if we can run our scripts without copying files to /tmp (or at least an option to)

Using python notebook.py doesn't copy anything to /tmp. I wonder if VSCode is doing that.

rfejgin commented 6 months ago

Using python notebook.py doesn't copy anything to /tmp. I wonder if VSCode is doing that.

Hmm, maybe it's not being copied after all. It's just that when I hit an error (e.g. if I imported something that can't be found), the exception says that it's in a file called e.g. /tmp/marimo_2545903/__marimo__cell_Hbol_.py. But when I examine /tmp I don't see that file.

Regardless, I can't seem to get normal VSCode breakpoints to get hit when running the script. I thought marimo was copying the file to /tmp but I guess not? In any case, I see marimo calling exec on the cell (in marimo._ast.cell.py.execute_cell()), maybe that's confusing the debugger?

By the way, this script comes from a Jupyter Notebook which I then converted to a script using marimo convert. I then run the resulting script in the VSCode debugger as I would for any script.

akshayka commented 6 months ago

Oh interesting, thanks for that context. Each cell is compiled and given a unique filename (which happens to be under tmp). Maybe that confuses vscode, or maybe it's the exec like you say.

It's surprising because this works at the command line (insert a breakpoint in your file with pdb.set_trace(), then run with python nb.py

sorig commented 6 months ago

+1 for interactive debugging support (not just pdb text interface). This is the main feature that's blocking me from moving our team from jupyter to marimo.

githubpsyche commented 6 months ago

Here's an approach for debugging inside VSCode I've found. The gist of the strategy is to separate the decoration of cells with @app.cell from the declaration of functions that use them.

import marimo

__generated_with = "0.4.12"
app = marimo.App()

def defines_x():
    x = 2
    return x

def defines_y(x):
    y = x + 1
    y
    return y

def computes_z(x, y):
    z = x + y + 1
    z
    return z

def test_computes_z():
    assert computes_z(5, 3) == 9

app.cell(computes_z)
app.cell(defines_y)
app.cell(defines_x)

if __name__ == "__main__":
    test_computes_z()
    app.run()

Since you still call app.cell(computes_z), the cell will work normally when you use marimo run or execute the Python file in your terminal.

But at the same time, with this design, functions like computes_z can be used like any other function, and separately from marimo. You can set breakpoints inside of it and debug via test_computes_z either with VSCode's Tests extension or by using VSCode's Python Debugger (in command palette, you can start typing "Debug Python" to find this).

You can even treat the notebook as a module, importing specific functions like you'd import Python functions normally OR using the recipe described in the docs's Cell API reference, which seems to take advantage of the DAG specified in your notebook in a way that the base function could not.

Downsides:

While you can open the notebook in editor mode, once you save the notebook from the editor UI, it'll be back to its usual format.
You don't get to debug the DAG specified by your notebook this way. Each cell function is treated as relatively standalone. You'll have to pass arguments explicitly, even if you retrieve them from other cells.
If you've allowed other global variables in your script file (e.g., a global numpy importthat you then use inside a cell), you might get different outputs frommarimo run` than in your test code.

I think this approach would work a lot more smoothly if marimo already separated calls to app.cell from function specification by default and save marimo notebooks this way when changes are made in the web UI's editor mode. I don't think this would hinder the script file's readability much at all, even as it would give notebooks more features!

With these changes, uers would be defining ordinary and pure Pythons functions as a side effect of notebook development inside marimo. They could then re-use these functions anywhere they'd like as if they weren't even specified inside notebooks in the first place. And at the same time, they'd still maintain access to the Cell instantiation of these functions -- either inside or outside the notebook.

Still, being able to debug within the DAG context would be better, and not be addressed by these changes.

rfejgin commented 6 months ago

+1 for interactive debugging support (not just pdb text interface). This is the main feature that's blocking me from moving our team from jupyter to marimo.

Same here - this is the main thing stopping me from switching to marimo.

akshayka commented 6 months ago

Thanks everyone for the thoughtful feedback.

I believe if we just ran the cells as functions, instead of exec-ing them, the debugger would work (as @githubpsyche suggests).

But app.run() also returns the visual outputs (last expression of each cell), which is why we exec/eval them.

We don't have a solution yet ... but just wanted to acknowledge that we hear you and hope to find one.

alefminus commented 6 months ago

Am I mistaken that a possible solution would be to return a tuple of the last expression and the return value, discarding the last expression when used as a notebook (via marimo module machinery) and using it as usual. Or am I missing something here?

Edit: I naively assumed you produce a python file anyway and instead of execing the code for a cell we could call the function (dynamically, per the DAG), but it seems that does not happen.

githubpsyche commented 4 months ago

https://code.visualstudio.com/docs/python/debugging#_local-script-debugging

Seems to provide some clue for getting this to work.

You add a remote attach configuration to your .launch.json (VSCode helps set this up with an "Add Configuration" button at bottom left of the file opened into editor):

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python Debugger: Remote Attach",
            "type": "debugpy",
            "request": "attach",
            "connect": {
                "host": "localhost",
                "port": 5678
            },
            "pathMappings": [
                {
                    "localRoot": "${workspaceFolder}",
                    "remoteRoot": "."
                }
            ]
        }
    ]
}

Then this is the gist of the pattern you use to start a debugging session:

import debugpy

# 5678 is the default attach port in the VS Code debug configurations. Unless a host and port are specified, host defaults to 127.0.0.1
debugpy.listen(5678)
print("Waiting for debugger attach")
debugpy.wait_for_client()
breakpoint()
print('break on this line')

I've found that if I put the debugging setup in its own cell (everything before breakpoint(), then calling breakpoint does configure a debugging session in the marimo cell's context in the way it's supposed to, without the hack I suggested earlier in this thread.

Maybe significantly, this works even if one is primarily developing with marimo's native UI, since it works over a (local) network connection.

Full example of a script with a breakpoint in it:

import marimo

__generated_with = "0.7.0"
app = marimo.App()

@app.cell
def __():
    import debugpy; debugpy.listen(5678); debugpy.wait_for_client()
    return debugpy,

@app.cell
def computes_z(x, y):
    z = x + y + 1
    z
    return z,

@app.cell
def defines_y(x):
    y = x + 1
    breakpoint()
    y
    return y,

@app.cell
def defines_x():
    x = 2
    return x,

if __name__ == "__main__":
    app.run()

Would be nice to find some way to smooth this workflow out further.

mscolnick commented 4 months ago

@githubpsyche - I really appreciate the detailed write-up.

It would be great to get this workflow smoother. are there any obvious things we could do?

for example:

our vscode-marimo extension coulud add a "contribution point" for a debugger as long as it can be generalized https://code.visualstudio.com/api/references/contribution-points#contributes.debuggers
we could add a --debugpy CLI arg that set's up import debugpy; debugpy.listen(debugpy_port or 5678); debugpy.wait_for_client() for you assuming order matters and that needs to go first.

I am not too familiar with debugpy or the vscode's debugpy integration so would appreciate any help

marimo-team / marimo