Open simonw opened 5 months ago
Thanks for the request and the links. marimo install
is very interesting. In your own tools, do your install commands support multiple package managers? Or do you just default to pip? Some of our users use conda, for example.
Another solution for this could be to imitate Jupyter's magic cell commands. In Jupyter I install packages like this:
Yes, we should support this at the least. I'd prefer to use python instead of a magic — maybe mo.pip("install altair")
, mo.conda("install altair")
? Or just mo.shell("pip install altair")
, similar to Jupyter's !
escape hatch. Thoughts?
One of the cooler things about Pluto.jl is that every notebook includes a manifest of the installed libraries, making them little containerized reproducible environments in and of themselves. +1 to the idea of a marimo install
command; at some point down the line it could be used to persist the version metadata of the installed library to the notebook.
From that perspective, I think it would make the most sense for the API to be something like mo.install(...)
rather than going directly to pip
or conda
in the shell, switching between pip or conda depending on the environment one is in, and then (one day) persisting the version metadata to the notebook.
The catch with a Python function is that calling pip install
from within Python exits the process!
Here's the most recent implementation of my install
command:
https://github.com/simonw/llm/blob/95597cc8f5c1612428fa4ab321afe2f173ac897f/llm/cli.py#L997-L1030
@cli.command()
@click.argument("packages", nargs=-1, required=False)
@click.option(
"-U", "--upgrade", is_flag=True, help="Upgrade packages to latest version"
)
@click.option(
"-e",
"--editable",
help="Install a project in editable mode from this path",
)
@click.option(
"--force-reinstall",
is_flag=True,
help="Reinstall all packages even if they are already up-to-date",
)
@click.option(
"--no-cache-dir",
is_flag=True,
help="Disable the cache",
)
def install(packages, upgrade, editable, force_reinstall, no_cache_dir):
"""Install packages from PyPI into the same environment as LLM"""
args = ["pip", "install"]
if upgrade:
args += ["--upgrade"]
if editable:
args += ["--editable", editable]
if force_reinstall:
args += ["--force-reinstall"]
if no_cache_dir:
args += ["--no-cache-dir"]
args += list(packages)
sys.argv = args
run_module("pip", run_name="__main__")
The key bit there is the run_module("pip", ..)
command. I used to do something different but it broke in weird ways, here are my notes on why I used that instead: https://til.simonwillison.net/python/call-pip-programatically
One of the cooler things about Pluto.jl is that every notebook includes a manifest of the installed libraries, making them little containerized reproducible environments in and of themselves.
IMHO, this is one of the least cool things about Pluto because the resulting .jl
file becomes huge and now contains the entire list of packages. As a result, my actual code, which is the entire point of the notebook, is buried deep after the packaging stuff (EDIT: apparently now the notebook code is before the packages). As another result, each and every Pluto notebook becomes like this, packed to the brim with packages information. The ratio of package info to my code is about 100:1 in each notebook, not because I don't write any code, but because there's too much package info. ...which is honestly useless because I can simply instantiate an environment in the directory with my notebooks and commit Packages.toml
and Manifest.toml
alongside the Pluto notebooks.
I think marimo shouldn't be a package manager or wrap a package manager or do any package management. Do one thing - be a notebook - and do it well. Package management can be done by standalone package managers.
Thanks everyone for the feedback. I know that every individual and every team has their own preferred workflow for package management. I also know that package management is an especially gnarly topic for Python, which is why we haven't imposed an opinionated solution.
Perhaps we can start by supporting the alternative suggestion — convenience methods to install packages via pip / conda (essentially Jupyter's%pip
and %conda
magics, which just call out to the current interpreter) — so users don't have to jump back to the terminal to install additional packages into their environment.
I installed marimo using pipx install marimo because I like to keep apps like this in their own environment.
@simonw, what if you were working on two different projects that both used marimo, but that had conflicting package requirements? Would that work with a global install of marimo withpipx
, or would you need to resort to pip
installing marimo in both project environments?
@baggiponte, this discussion reminds me of your suggestion that marimo be installable as an environment agnostic tool, with an optional flag for specifying the interpreter to use: something like marimo edit notebook.py --interpreter=/path/to/interpreter
. And if not specified marimo would use the currently activated environment. I guess installing marimo via pipx
gets you part of the way there, but perhaps not quite all the way?
Hey there, thanks for looping me in. I stand by the point that @akshayka reported. I think the real feature request is making marimo
environment agnostic. Upon running marimo
would prompt the user to choose an interpreter path and store it in the project's pyproject.toml
's [tool.marimo]
table. The interpreter could be chosen/switched from the UI as well.
@ForceBru summarised it very efficiently: marimo is a notebook. We are used to pip install
-ing Jupyter in any venv because it only works like this, but I argue you would never put VS Code/PyCharm/vim/Emacs in your project's optional dependencies (or inside your env). However, marimo
is also a Python library to design applications, which can be imported to provide UI elements. Maybe the solution is to split the two features. Nevertheless, the solution won't be trivial.
About the other features that came up (marimo.install
/marimo install
and a "manifest of the installed libraries"): I think they would imply spending a lot of the maintainers' time on "reinventing the wheel" and testing and fixing bugs for a pretty big deal of glue code. marimo.install
would have to support every Python package manager such as pdm
, poetry
, hatch
and conda
or mamba
or pixi
or whatever new comes up. Keep in mind that pdm
and poetry
don't require virtual environments to be active, so you cannot just look for the right virtual env variable and go from there. I also think that installing a library with pip
inside a project managed with poetry
or pdm
could not be found. (The command should definitely not install libraries in an environment other than the project's: if marimo install
put stuff inside the isolated env that pipx
created for it, it would be a problem).
Technically, if there is no pyproject.toml
file in the project dir, then a marimo
notebook could include the dependencies necessary to run it as per PEP 723, which was recently accepted. But I am skeptical about marimo run
generating a venv and installing dependencies.
@simonw, what if you were working on two different projects that both used marimo, but that had conflicting package requirements? Would that work with a global install of marimo with
pipx
, or would you need to resort topip
installing marimo in both project environments?
Yeah, in that case I would install marimo multiple times in the different project environments. That's what I do with Jupyter.
My "pipx" installed marimo is the one I'd use for general research and tinkering.
@baggiponte, thanks for the thoughtful remarks.
Maybe the solution is to split the two features. Nevertheless, the solution won't be trivial.
We considered this. Ultimately we preferred a "batteries-included" design where installing the marimo notebook also got you the marimo library features. But perhaps in the long-run we could make the marimo package contain two separate packages — the notebook/editor and the library, and users could if they so chose install them separately.
About the other features that came up (marimo.install/marimo install and a "manifest of the installed libraries"): I think they would imply spending a lot of the maintainers' time on "reinventing the wheel" and testing and fixing bugs for a pretty big deal of glue code.
It certainly feels like a dauntingly large task.
Technically, if there is no pyproject.toml file in the project dir, then a marimo notebook could include the dependencies necessary to run it as per PEP 723, which was recently accepted.
That's quite interesting ... though I understand your skepticism.
@simonw, what if you were working on two different projects that both used marimo, but that had conflicting package requirements? Would that work with a global install of marimo with
pipx
, or would you need to resort topip
installing marimo in both project environments?Yeah, in that case I would install marimo multiple times in the different project environments. That's what I do with Jupyter.
My "pipx" installed marimo is the one I'd use for general research and tinkering.
That makes sense. I have a very similar workflow. Thanks for sharing (and thanks for sending the code snippet for your install command).
Technically, if there is no pyproject.toml file in the project dir, then a marimo notebook could include the dependencies necessary to run it as per PEP 723, which was recently accepted.
That's quite interesting ... though I understand your skepticism.
Technically, in this case marimo run
could be an alias to pipx run
: since v1.4, pipx
runner is compatible with PEP723.
I am not a fan of the idea behind it: package everything in a single file. Feels poor engineering. I am biased and potentially wrong. But the PEP723 syntax is basically a bit of a pyproject.toml, so why not just writing one since we have this standard? As per Chris Warrick:
Is it super useful? I don’t think so; setting up a project with pyproject.toml would easily allow things to grow. If you’re sending something via a GitHub gist, just make a repo. If you’re sending something by e-mail, just tar the folder. That approach promotes messy programming without source control
My biggest concern is: how will you end up writing a marimo notebook without having installed dependencies first? What is the use-case behind this feature, so we can share the notebook outside of github repos?
On the other hand, marimo
could analyse the import statements and prompt you to include missing dependencies from pyproject.
(Last comment about this feature, feels a bit offtopic and might be an issue on itself.)
Description
I installed marimo using
pipx install marimo
because I like to keep apps like this in their own environment.Then I saw in https://docs.marimo.io/guides/plotting.html that I would need to run
pip install altair
to get charts.Having used
pipx
I have to go and remind myself how to install into that environment - it'spipx inject matrimo altair
, but I always have to look that up.Suggested solution
Add a
marimo install x
command which runspip install x
in the same Python environment thatmarimo
is installed in.I have this for several of my own tools, to solve the same problem:
Alternative
Another solution for this could be to imitate Jupyter's magic cell commands. In Jupyter I install packages like this:
Because then I know Jupyter will put them in the correct Python environment.