marimo-team / marimo

A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
https://marimo.io
Apache License 2.0
6.8k stars 239 forks source link

app.function: top-level functions in notebook files #2293

Open akshayka opened 1 month ago

akshayka commented 1 month ago

This issue is for making it possible to import functions directly from marimo notebooks:

from notebook import my_function

Only "pure" functions will be importable, ie functions that have no refs besides imported modules. Some care will need to be taken to handle modules.

A cell that includes a single function def, and nothing more, will be saved in the notebook file as:

@app.function
def my_function():
  ...

Discussed on:

patrick-kidger commented 1 month ago

I worry that privileging just single-cell functions is going to be a huge UX footgun. There are many other things I might reasonably want the same behaviour from. (Classes are the obvious example.) Moreover I will almost always want my function to depend upon other values:

SOME_CONSTANT = 42

def the_answer_to_life_the_universe_and_everything() -> int:
    return SOME_CONSTANT

Or -- I may simply write code that depends upon being able to import a function like this... and now find that I am unable to adjust this function's behaviour without breaking that import resolution.


Can I make an alternate suggestion? Marimo supports adding meaningful names for cells, so it could be arranged for the syntax import my_notebook; my_notebook.my_cell.my_function to work.

In particular this would:

akshayka commented 1 month ago

Or -- I may simply write code that depends upon being able to import a function like this... and now find that I am unable to adjust this function's behaviour without breaking that import resolution.

Oh, that's a very good point; that kind of action at a distance would be sad. I hadn't considered this.

Regarding

so it could be arranged for the syntax import my_notebook; my_notebook.my_cell.my_function to work

It's definitely an interesting idea, and would be readily done by abstracting over the low-level API we have for programmatically running cells: https://docs.marimo.io/api/cell.html. AFAIK right now I'm the only consumer of that API, it's quite low level.

I've considered what you've suggested before, but the downside is that you won't get code completion, type information, etc. when typing out

my_notebook.my_cell.my_function

whereas top-level functions would play well with an LSP and other static analysis tools.

Given these tradeoffs, would you still prefer a cell-based API? We'd have to make a breaking change to the cell API to support this (what if a user had a function called run?), or perhaps we could expose it under a namespace, such as my_cell.functions.my_function ...

patrick-kidger commented 1 month ago

Okay, refining this idea a little bit: it should be possible to define a module-level __getattr__ that runs the appropriate cells up to the point of providing that name. That would enable the simpler my_module.my_value based lookup, without having to name an individual cell.

This would then offer your originally suggested API, whilst supporting all kinds of Python types. (And neatly sidesteps questions about the Cell API.)

Autocompletion could be done via an if TYPE_CHECKING: block somewhere in the file.

dmadisetti commented 1 month ago

Relevant: https://github.com/marimo-team/marimo/issues/1954

So in previous discussions @app.function, would be primarily used for serializing and creating cells that with a name, and denoting that it is a pure function. Importing a function from a cell is disingenuous when it can rely on those additional, cell specific references that are not constants. For instance:

result = expensive_computation()
mutable_arr_in_scope = []

# Not clear how exposed_fn should be exposed,
# or what the behavior should be on state change.
# Even if convention decided, should be intuitive API
# Not worth complexity IMO
def exposed_fn() -> int:
    mutable_arr_in_scope.append(state_in_diff_cell())
    return result, len(mutable_arr_in_scope)

One fix might be to move constants (all caps and are 'primitive') to the top level script. There has been discussion about doing this for import only cells (https://github.com/marimo-team/marimo/discussions/1379), it might be feasible to do this for constant only cells

import marimo

# cell 2 <-  (Auto generated comment? Use name if given)
import numpy as np

# cell 3
SCALE = 1

__generated_with = "0.0.0"
app = marimo.app()

@app.function
def lengthify(v, scale=SCALE, axis=-1):
   return scale * v / np.linalg.norm(v, axis=axis)

The previous discussion of @app.function made the UI very simple.

  1. If a cell with only a function definition hits the scope requirements, then it is automatically turned into an @app.function, and cell name shows that (similar to a sql cell showing the df name).
  2. If a cell with only a function definition does not hit the requirements (cell name conflicts with function name, scope refs are not modules or constants), then a "?" icon appears where cell name should be, explaining why it is not turned into a @app.function
dmadisetti commented 1 month ago

Another implementation realization. @app.function should be able to reference other @app.functions

EPSILON = 1e-12

@app.function
def probs_ortho(v, u):
    return np.abs(np.dot(lengthify(v), lengthify(u))) <= EPSILON