marimo-team / marimo

A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
https://marimo.io
Apache License 2.0
5.4k stars 159 forks source link

Suggestion: "marimo install x" command that mirrors "pip install x" #542

Open simonw opened 5 months ago

simonw commented 5 months ago

Description

I installed marimo using pipx install marimo because I like to keep apps like this in their own environment.

Then I saw in https://docs.marimo.io/guides/plotting.html that I would need to run pip install altair to get charts.

Having used pipx I have to go and remind myself how to install into that environment - it's pipx inject matrimo altair, but I always have to look that up.

Suggested solution

Add a marimo install x command which runs pip install x in the same Python environment that marimo is installed in.

I have this for several of my own tools, to solve the same problem:

Alternative

Another solution for this could be to imitate Jupyter's magic cell commands. In Jupyter I install packages like this:

%pip install altair

Because then I know Jupyter will put them in the correct Python environment.

akshayka commented 5 months ago

Thanks for the request and the links. marimo install is very interesting. In your own tools, do your install commands support multiple package managers? Or do you just default to pip? Some of our users use conda, for example.

Another solution for this could be to imitate Jupyter's magic cell commands. In Jupyter I install packages like this:

Yes, we should support this at the least. I'd prefer to use python instead of a magic — maybe mo.pip("install altair"), mo.conda("install altair")? Or just mo.shell("pip install altair"), similar to Jupyter's ! escape hatch. Thoughts?

smacke commented 5 months ago

One of the cooler things about Pluto.jl is that every notebook includes a manifest of the installed libraries, making them little containerized reproducible environments in and of themselves. +1 to the idea of a marimo install command; at some point down the line it could be used to persist the version metadata of the installed library to the notebook.

From that perspective, I think it would make the most sense for the API to be something like mo.install(...) rather than going directly to pip or conda in the shell, switching between pip or conda depending on the environment one is in, and then (one day) persisting the version metadata to the notebook.

simonw commented 5 months ago

The catch with a Python function is that calling pip install from within Python exits the process!

Here's the most recent implementation of my install command:

https://github.com/simonw/llm/blob/95597cc8f5c1612428fa4ab321afe2f173ac897f/llm/cli.py#L997-L1030

@cli.command()
@click.argument("packages", nargs=-1, required=False)
@click.option(
    "-U", "--upgrade", is_flag=True, help="Upgrade packages to latest version"
)
@click.option(
    "-e",
    "--editable",
    help="Install a project in editable mode from this path",
)
@click.option(
    "--force-reinstall",
    is_flag=True,
    help="Reinstall all packages even if they are already up-to-date",
)
@click.option(
    "--no-cache-dir",
    is_flag=True,
    help="Disable the cache",
)
def install(packages, upgrade, editable, force_reinstall, no_cache_dir):
    """Install packages from PyPI into the same environment as LLM"""
    args = ["pip", "install"]
    if upgrade:
        args += ["--upgrade"]
    if editable:
        args += ["--editable", editable]
    if force_reinstall:
        args += ["--force-reinstall"]
    if no_cache_dir:
        args += ["--no-cache-dir"]
    args += list(packages)
    sys.argv = args
    run_module("pip", run_name="__main__")

The key bit there is the run_module("pip", ..) command. I used to do something different but it broke in weird ways, here are my notes on why I used that instead: https://til.simonwillison.net/python/call-pip-programatically

ForceBru commented 5 months ago

One of the cooler things about Pluto.jl is that every notebook includes a manifest of the installed libraries, making them little containerized reproducible environments in and of themselves.

IMHO, this is one of the least cool things about Pluto because the resulting .jl file becomes huge and now contains the entire list of packages. As a result, my actual code, which is the entire point of the notebook, is buried deep after the packaging stuff (EDIT: apparently now the notebook code is before the packages). As another result, each and every Pluto notebook becomes like this, packed to the brim with packages information. The ratio of package info to my code is about 100:1 in each notebook, not because I don't write any code, but because there's too much package info. ...which is honestly useless because I can simply instantiate an environment in the directory with my notebooks and commit Packages.toml and Manifest.toml alongside the Pluto notebooks.

I think marimo shouldn't be a package manager or wrap a package manager or do any package management. Do one thing - be a notebook - and do it well. Package management can be done by standalone package managers.

akshayka commented 5 months ago

Thanks everyone for the feedback. I know that every individual and every team has their own preferred workflow for package management. I also know that package management is an especially gnarly topic for Python, which is why we haven't imposed an opinionated solution.

Perhaps we can start by supporting the alternative suggestion — convenience methods to install packages via pip / conda (essentially Jupyter's%pip and %conda magics, which just call out to the current interpreter) — so users don't have to jump back to the terminal to install additional packages into their environment.

akshayka commented 5 months ago

I installed marimo using pipx install marimo because I like to keep apps like this in their own environment.

@simonw, what if you were working on two different projects that both used marimo, but that had conflicting package requirements? Would that work with a global install of marimo withpipx, or would you need to resort to pip installing marimo in both project environments?

@baggiponte, this discussion reminds me of your suggestion that marimo be installable as an environment agnostic tool, with an optional flag for specifying the interpreter to use: something like marimo edit notebook.py --interpreter=/path/to/interpreter. And if not specified marimo would use the currently activated environment. I guess installing marimo via pipx gets you part of the way there, but perhaps not quite all the way?

baggiponte commented 5 months ago

Hey there, thanks for looping me in. I stand by the point that @akshayka reported. I think the real feature request is making marimo environment agnostic. Upon running marimo would prompt the user to choose an interpreter path and store it in the project's pyproject.toml's [tool.marimo] table. The interpreter could be chosen/switched from the UI as well.

@ForceBru summarised it very efficiently: marimo is a notebook. We are used to pip install-ing Jupyter in any venv because it only works like this, but I argue you would never put VS Code/PyCharm/vim/Emacs in your project's optional dependencies (or inside your env). However, marimo is also a Python library to design applications, which can be imported to provide UI elements. Maybe the solution is to split the two features. Nevertheless, the solution won't be trivial.

About the other features that came up (marimo.install/marimo install and a "manifest of the installed libraries"): I think they would imply spending a lot of the maintainers' time on "reinventing the wheel" and testing and fixing bugs for a pretty big deal of glue code. marimo.install would have to support every Python package manager such as pdm, poetry, hatch and conda or mamba or pixi or whatever new comes up. Keep in mind that pdm and poetry don't require virtual environments to be active, so you cannot just look for the right virtual env variable and go from there. I also think that installing a library with pip inside a project managed with poetry or pdm could not be found. (The command should definitely not install libraries in an environment other than the project's: if marimo install put stuff inside the isolated env that pipx created for it, it would be a problem).

Technically, if there is no pyproject.toml file in the project dir, then a marimo notebook could include the dependencies necessary to run it as per PEP 723, which was recently accepted. But I am skeptical about marimo run generating a venv and installing dependencies.

simonw commented 5 months ago

@simonw, what if you were working on two different projects that both used marimo, but that had conflicting package requirements? Would that work with a global install of marimo withpipx, or would you need to resort to pip installing marimo in both project environments?

Yeah, in that case I would install marimo multiple times in the different project environments. That's what I do with Jupyter.

My "pipx" installed marimo is the one I'd use for general research and tinkering.

akshayka commented 5 months ago

@baggiponte, thanks for the thoughtful remarks.

Maybe the solution is to split the two features. Nevertheless, the solution won't be trivial.

We considered this. Ultimately we preferred a "batteries-included" design where installing the marimo notebook also got you the marimo library features. But perhaps in the long-run we could make the marimo package contain two separate packages — the notebook/editor and the library, and users could if they so chose install them separately.

About the other features that came up (marimo.install/marimo install and a "manifest of the installed libraries"): I think they would imply spending a lot of the maintainers' time on "reinventing the wheel" and testing and fixing bugs for a pretty big deal of glue code.

It certainly feels like a dauntingly large task.

Technically, if there is no pyproject.toml file in the project dir, then a marimo notebook could include the dependencies necessary to run it as per PEP 723, which was recently accepted.

That's quite interesting ... though I understand your skepticism.

akshayka commented 5 months ago

@simonw, what if you were working on two different projects that both used marimo, but that had conflicting package requirements? Would that work with a global install of marimo withpipx, or would you need to resort to pip installing marimo in both project environments?

Yeah, in that case I would install marimo multiple times in the different project environments. That's what I do with Jupyter.

My "pipx" installed marimo is the one I'd use for general research and tinkering.

That makes sense. I have a very similar workflow. Thanks for sharing (and thanks for sending the code snippet for your install command).

baggiponte commented 5 months ago

Technically, if there is no pyproject.toml file in the project dir, then a marimo notebook could include the dependencies necessary to run it as per PEP 723, which was recently accepted.

That's quite interesting ... though I understand your skepticism.

Technically, in this case marimo run could be an alias to pipx run: since v1.4, pipx runner is compatible with PEP723.

I am not a fan of the idea behind it: package everything in a single file. Feels poor engineering. I am biased and potentially wrong. But the PEP723 syntax is basically a bit of a pyproject.toml, so why not just writing one since we have this standard? As per Chris Warrick:

Is it super useful? I don’t think so; setting up a project with pyproject.toml would easily allow things to grow. If you’re sending something via a GitHub gist, just make a repo. If you’re sending something by e-mail, just tar the folder. That approach promotes messy programming without source control

My biggest concern is: how will you end up writing a marimo notebook without having installed dependencies first? What is the use-case behind this feature, so we can share the notebook outside of github repos?

On the other hand, marimo could analyse the import statements and prompt you to include missing dependencies from pyproject.

(Last comment about this feature, feels a bit offtopic and might be an issue on itself.)