marimo-team / marimo

A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
https://marimo.io
Apache License 2.0
8.01k stars 290 forks source link

state example insufficiently explained #2738

Open gitgithan opened 1 month ago

gitgithan commented 1 month ago

Documentation is

Explain in Detail

I can't understand the dataflow of below example from https://docs.marimo.io/guides/state.html and also marimo tutorial ui

I have many questions in bold below.

def add_task():
    if task_entry_box.value:
        set_tasks(lambda v: v + [Task(task_entry_box.value)])
        set_task_added(True)

def clear_tasks():
    set_tasks(lambda v: [task for task in v if not task.done])

add_task_button = mo.ui.button(
    label="add task",
    on_change=lambda _: add_task(),
)

clear_tasks_button = mo.ui.button(
    label="clear completed tasks",
    on_change=lambda _: clear_tasks()
)
task_list = mo.ui.array(
    [mo.ui.checkbox(value=task.done, label=task.name) for task in get_tasks()],
    label="tasks",
    on_change=lambda v: set_tasks(
        lambda tasks: [Task(task.name, done=v[i]) for i, task in enumerate(tasks)]
    ),
)

It was hard to understand because for some examples of on_change, it's lambda v: while some other examples use lambda _:. When do we use lambda v:, when do we use lambda _:?

What does v in on_change=lambda v: set_tasks represent? Is it a list of booleans from the array of checkboxes?

To understand what i is doing, i tried enumerating from 1 instead of default 0, and expected it to break which it did because i guess v is limited by the number of checkboxes in the array. Task(task.name, done=v[i]) for i, task in enumerate(tasks,1).

However I don't understand why it only broke when I clicked the checkbox, and nothing happens when I click clear? (i thought clear should break it too since clear is somehow part of this system)

Please check if my understanding of the dataflow is correct. I'm assuming add task goes through the same flow so i'll describe the more confusing clear completed tasks. I'm looking for something like this in the docs:

  1. Clicking clear completed tasks triggers set_tasks(lambda v: [task for task in v if not task.done]) of def clear_tasks()
  2. Calling set_tasks causes all cells (other than the same cell with set_tasks) with get_tasks of mo.state([]) to run
  3. mo.ui.checkbox(value=task.done, label=task.name) for task in get_tasks() contains get_tasks, so it ran the cell to recreate mo.ui.array and all checkboxes in it.
  4. Because UI got updated, it caused on_change=lambda v: set_tasks to run (So this is the 2nd time set_tasks is running in this chain)
  5. get_tasks in step 3 is not run again because it appears in the same cell as on_change=lambda v: set_tasks, so the loop ends

For checkbox interactions, does it follow the same dataflow as steps 4 and 5 of the previous 5 step sequence? I suspect both buttons do not trigger on_change, which means in the 5 step flow above, it should stop after step 3, and step 4 onwards is reserved for checkbox toggling actions?

I tried debugging what causes on_change by adding a print to the array handler, added a "task1" to the list, then repeatedly toggled the checkbox. (suggest better introspection methods if any)

def debug_handler(v):
    print(v)
    set_tasks(
        lambda tasks: [
            Task(task.name, done=v[i]) for i, task in enumerate(tasks)
        ]
    )

task_list = mo.ui.array(
    [
        mo.ui.checkbox(value=task.done, label=task.name)
        for task in get_tasks()
    ],
    label="tasks",
    on_change=debug_handler
)

Output

[True]
[True]
[False]
[True]
[True]
[False]
[False]
[True]
[True]
[False]
[True]
[True]
[False]
[False]
[True]
[False]

Why is the output randomly adding 1 or 2 lines of True and False between toggles? I expected each toggle to generate only 1 on_change call, so only 1 True or False should be printed per line.

I can replicate this behaviour on a much simpler example. In 1 cell:

complex = mo.ui.array(mo.ui.checkbox() for _ in range(2))
mo.hstack([complex])

In 2nd cell: print(complex.value), then toggle checkboxes and observe how it prints 1 or 2 rows randomly.

Alternatively, a single cell with on_change can demonstrate it (not sure if same principles underlie the 2 cell problem reproduction):

complex = mo.ui.array((mo.ui.checkbox() for _ in range(2)), 
                      on_change=lambda v: print(v))
mo.hstack([complex])

To really stretch this example, i tried nesting composite elements, and saw randomly 1-5 rows of prints appended each time I toggled a checkbox. Is this a bug?

complex = mo.ui.array((mo.ui.array(mo.ui.checkbox() for _ in range(2)) for _ in range(2)), on_change=lambda v:print(v))
mo.hstack([complex])

What is considered on_change in the mo.ui.array? Why does adding tasks through add task button not trigger the prints in on_change=debug_handler?. Are only interactions with the nested UI elements in the array (eg. toggling boxes) considered a change?

How did the mo.ui.array show updated checkbox states while i'm toggling the checkboxes? I'm not sure if there are multiple ways mo.ui.array can update, and whether both ways look the same visually so user can't tell from just looking, but need to add prints in on_change like what i'm doing to see the difference?

Lastly, why is task_list_mutated, set_task_list_mutated = mo.state(False) and set_task_list_mutated(True) appearing in the handlers of the 2 buttons if task_list_mutated was never used? What's the point of setting if nothing is reading this state?

I deleted task_list_mutated, set_task_list_mutated = mo.state(False), the lone task_list_mutated , and the 2 set_task_list_mutated(True), and can't see any difference in behavior for both buttons in this whole task management UI.

Your Suggestion for Changes

akshayka commented 4 weeks ago

Regarding your simple examples such as

To really stretch this example, i tried nesting composite elements, and saw randomly 1-5 rows of prints appended each time I toggled a checkbox.

I could not reproduce multiple rows being printed. For me only one row is printed on each interaction. This is my code: https://marimo.app/l/3fh9l9.

What operating system, marimo version, etc are you using? You can use marimo env at the command line to get this information.

What is considered on_change in the mo.ui.array?

on_change is an "optional callback to run when this element’s value changes". So any time the value of the array changes, on_change should be run.

akshayka commented 4 weeks ago

In general we strongly discourage users from using mo.state(), which is difficult to reason about, and instead encourage them to rely on marimo's static analysis based reactivity. We plan to remove the state section from the UI tutorial.

gitgithan commented 4 weeks ago

I'm using wsl2 on windows. marimo env shows

{
  "marimo": "0.9.14",
  "OS": "Linux",
  "OS Version": "5.15.153.1-microsoft-standard-WSL2",
  "Processor": "x86_64",
  "Python Version": "3.10.15",
  "Binaries": {
    "Browser": "--",
    "Node": "v20.15.1"
  },
  "Dependencies": {
    "click": "8.1.7",
    "docutils": "0.16",
    "itsdangerous": "2.2.0",
    "jedi": "0.19.1",
    "markdown": "3.6",
    "narwhals": "1.11.0",
    "packaging": "24.1",
    "psutil": "5.9.8",
    "pygments": "2.18.0",
    "pymdown-extensions": "10.11.2",
    "pyyaml": "6.0.1",
    "ruff": "0.7.1",
    "starlette": "0.27.0",
    "tomlkit": "0.12.5",
    "typing-extensions": "4.12.0",
    "uvicorn": "0.32.0",
    "websockets": "12.0"
  },
  "Optional Dependencies": {
    "altair": "5.4.1",
    "duckdb": "1.1.2",
    "pandas": "2.0.0",
    "polars": "1.7.0",
    "pyarrow": "18.0.0"
  }
}

I tried your example with only a single layer of mo.ui.array instead of my original example with nested mo.ui.array((mo.ui.array... and toggled the 1st checkbox. It still gave repeated outputs of 2,2,2,1,1 row each time.

[True, False]
[True, False]
[False, False]
[False, False]
[True, False]
[True, False]
[False, False]
[True, False]

any time the value of the array changes, on_change should be run

This implies on_change should be triggered by not just toggling checkbox, but also by changes in length of task list from clicking the 2 buttons. Thinking again, if clicking buttons causes the entire cell to be re-ran and mo.ui.array to be recreated (hypothesized in step 3/5 of dataflow above), then even if on_change should respond to changes in length of task list, the UI element on_change was part of may have already been overwritten by the cell rerun and the old reference to task_list is lost. So I can't see this on screen if on_change is also triggered by changes in number of tasks in the list.

A failed experiment to prove that on_change reacts to list length changes :

To prevent the whole mo.ui.array from being recreated and only change its value while keeping the same mo.ui.array instance, I tried to only update the value of the array UI element doing complex.value = [mo.ui.checkbox() for _ in get_tasks_simple()], but this caused RuntimeError: Setting the value of a UIElement is not allowed. If you need to imperatively set the value of a UIElement, consider using mo.state().

get_tasks_simple, set_tasks_simple = mo.state([])
complex = mo.ui.array(
    (mo.ui.checkbox() for _ in range(2)), on_change=lambda v: print(v)
)
complex.value = [mo.ui.checkbox() for _ in get_tasks_simple()]  # ERROR
def add_task_simple():
    set_tasks_simple(lambda v: v + ['new hardcoded task'])

add_task_button_simple = mo.ui.button(
    label="add task",
    on_change=lambda _: add_task_simple(),
)
mo.hstack([complex, add_task_button_simple])
liquidcarbon commented 6 days ago

I'm also a bit confused about when (not) to use state.

Extending on the recipe about buttons, I'm trying to have a form-like UI without using a form, where input, button, and output are grouped together.

It make sense that TEXT is reset to empty string because button value changed. But I'd like the button to trigger the action, and to be able to modify the input without erasing previous output.

Is this achievable without State?

Image

# cell 1
text, button, TEXT

# cell 2
TEXT="initial"
if button.value:
    TEXT = text.value.upper()

# cell 3
button = mo.ui.run_button(label="CAPITALIZE")
text = mo.ui.text()
akshayka commented 6 days ago

@liquidcarbon Yes, this is possible without state, by making use of the fact that Python does not mutate variables. I personally find state confusing because it causes cells to re-run in a way that is not captured by the graph, and prefer mutating regular Python variables instead -- that way I can still reason about when cells will re-run by looking at the dependency graph.

Image

import marimo

__generated_with = "0.9.20"
app = marimo.App(width="medium")

@app.cell
def __():
    import marimo as mo
    return (mo,)

@app.cell
def __(TEXT, button, text):
    if button.value:
        TEXT[0] = text.value.upper()

    text, button, TEXT[0]
    return

@app.cell
def __():
    # create a container that holds the capitalized text
    # this cell won't ever re-run
    TEXT = ["initial"]
    return (TEXT,)

@app.cell
def __(mo):
    # cell 3
    button = mo.ui.run_button(label="CAPITALIZE")
    text = mo.ui.text()
    return button, text

if __name__ == "__main__":
    app.run()
liquidcarbon commented 6 days ago

This is great, why don't I add this to the recipes?

akshayka commented 6 days ago

Please do! :)