marimo-team / marimo

A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
https://marimo.io
Apache License 2.0
6.8k stars 239 forks source link

plt.hist shows a list of BarContainers #2628

Open dmadisetti opened 5 days ago

dmadisetti commented 5 days ago

Describe the bug

On


import matplotlib.pyplot as plt
import numpy as np

plt.hist(np.random.rand(1000), bins=100)

marimo iterates over the bar container to produce many duplicate plots:

Image

Expected type nd.array, nd.array, barcontainer: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html

Outputting the bar container by itself works as expected:

Image

but wrapping it in an iterable produces bad news:

Image

Environment

head, see: https://marimo.app/l/uudxmj

Code to reproduce

import marimo

__generated_with = "0.9.10-dev8"
app = marimo.App()

@app.cell
def __():
    import matplotlib.pyplot as plt
    import numpy as np

    A, B, C = plt.hist(np.random.rand(1000), bins=100)
    return A, B, C, np, plt

@app.cell
def __(C):
    [C]
    return

@app.cell
def __(C):
    C
    return

if __name__ == "__main__":
    app.run()
akshayka commented 5 days ago

This is because BarContainer is a subclass of tuple, and our structures formatter flattens it. We do have a formatter for BarContainer.

Relevant code:

https://github.com/marimo-team/marimo/blob/main/marimo/_utils/flatten.py#L129-L140

We could either:

  1. Flatten only lists, tuples, dicts and not subclasses of them (not sure if that has bad second-order effects)
  2. Hack: special case bar container in _flatten (ugly but no second-order effects)
mscolnick commented 5 days ago

Hack: special case bar container in _flatten (ugly but no second-order effects)

could we add a formatter for BarContainer in matplotlib_formatters.py? and then it is not a special case in _flatten

akshayka commented 5 days ago

We already have a formatter for it, which is why emitting a bare BarContainer works.

The issue arises when a BarContainer is in a structure, which is when flatten kicks in and the isinstance(item, tuple) check takes priority leading it to being unpacked instead of formatted.

And can't just naively check for the existence of a formatter first, since tuple etc have formatters (ie flatten).

dmadisetti commented 5 days ago

What about checking to see if the item belongs in formatters? And if it does, not continuing to unpack

        if isinstance(obj, FORMATTERS.keys()):
            # get formatter, format and kick out

That way not barcontainer specific, and behavior is handled only for defined formatters

akshayka commented 5 days ago

Something like that could work. But tuple, list, and dict are all in FORMATTERS.keys(), and using their formatters would lead to unbounded recursion.