marimo-team / marimo

A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
https://marimo.io
Apache License 2.0
6.15k stars 200 forks source link

Can Marimo use unique default names for cells to serialize to a nicer Python file? #1954

Open danielhfrank opened 1 month ago

danielhfrank commented 1 month ago

Description

Right now, when I work on a Marimo notebook, I'm delighted that the serialized format is a plain python file. However, by default, each cell is serialized as a function with name "__". This makes for rather funny-looking python files, and if you have any sort of linter in your codebase, it will certainly not be happy.

I can see that if I create a new name for my cell, then it will be used in the serialized python file. I think it would be great if each cell had a unique name, so that the rendered python file inherits those names for the autogenerated functions.

Suggested solution

When we create a new cell, we should assign a unique name. I've not been able to look into the codebase to see if it would be practical to do something like cell_0, cell_1, etc, or if a more stateless identifier should be used. Since this is pretty transparent to the user, I would not be too concerned about readability of the name, but if it can be achieved, that would be nice!

I think this behavior is triggered by the default here: https://github.com/marimo-team/marimo/blob/b99ca956cada7aea1e874d4a35d82a7054926b21/marimo/_ast/app.py#L468

I'm not quite sure though!

Alternative

No response

Additional context

I would be happy to contribute this change myself! I could use some pointers on whether the functionality I'm talking about is coming from where I linked above. If so, I can read up on contributing guidelines, and try to put it together!

akshayka commented 1 month ago

The issue is that we want to introduce as few names into the file as possible. For example, in the future we might make it so that functions defined in a notebook that don't depend on any other variables are serialized directly into the notebook, instead of being wrapped in an app.cell decorator. If cells had unique names, especially human-readable ones, there would be a chance that these names would conflict with user-defined function names.

I understand the concern about linting, though. I am open to considering making the cell names unique to appease linters, though the names likely won't be very readable.

dmadisetti commented 1 month ago

I think there was the suggestion on naming based on execution graph position.

But to appease the linters, users can manually give names- which is much more readable than than anything that could be automated. Related, this code cleanliness check could be incorporated in a lint command: https://github.com/marimo-team/marimo/issues/1543

akshayka commented 1 month ago

I think there was the suggestion on naming based on execution graph position.

Right. But this will add additional constraints to function names, thinking ahead to when we add top-level functions (@app.function). Perhaps that's okay, because most users likely wouldn't create functions named _cell_0, _cell_1, ...?

But to appease the linters, users can manually give names- which is much more readable than than anything that could be automated.

Yes definitely. I suppose @danielhfrank's concern is that this is too much work?

dmadisetti commented 1 month ago

marimo lint --suggest-names could pass cells into a GPT and run a prompt to rename cells?

I think renaming to _cell_x gives the false security that default cell can be reliably utilized in an import. Or maybe prefix _ are hidden in the app API?

danielhfrank commented 1 month ago

Glad to see that this has sparked some conversation, I appreciate the attention!

But to appease the linters, users can manually give names- which is much more readable than than anything that could be automated.

Yes definitely. I suppose @danielhfrank's concern is that this is too much work?

Yes, indeed, I see that I can manually rename cells, but it's a lot of work - I think there could be a much smoother user experience if this were done automatically.

Having thought about it a bit, from my own perspective, I wouldn't really care if the function names were very human readable, or even if they reflected any sort of ordering. I think that a name like _cell_33c443 (some random hex identifier) would work well for my and others' use cases, and I expect would not interfere with any user code.

marimo lint --suggest-names could pass cells into a GPT and run a prompt to rename cells?

I think that a marimo lint command could help here if this didn't sound appealing as default behavior. Of course, my personal preference would be to change the defaults to unique names.

Thanks again for the consideration, and again, I would be happy to help contribute if you all think this is a constructive change!

ggggggggg commented 1 month ago

Right. But this will add additional constraints to function names, thinking ahead to when we add top-level functions (@app.function). Perhaps that's okay, because most users likely wouldn't create functions named _cell_0, _cell_1, ...?

I think it's pretty reasonable to define some cell naming scheme and enforce that users are not allowed to name cells that conflict with that scheme. As long as it has a clear error message when someone does conflict with a clear action item to fix it, I don't think you'll get any pushback. Think of it like a keyword in a language, you just can't use them as variables, end of story.