marimo-team / marimo

A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
https://marimo.io
Apache License 2.0
8.07k stars 291 forks source link

Make generated Python files `mypy` error-free #2325

Open shunichironomura opened 2 months ago

shunichironomura commented 2 months ago

Description

When running mypy on marimo notebooks, mypy reports error: Name "__" already defined on line X [no-redef] errors because cells in marimo notebooks are all defined as functions with the same name: __. This makes it difficult to properly type-check marimo notebooks.

Suggested solution

There are at least three approaches to resolve this:

  1. Define cells as functions with unique names. This approach eradicates the root cause, but possibly has side effects causing other issues.
  2. Add # type: ignore[no-redef] comments to lines containing @app.cell to suppress errors. This approach has probably the fewest side effects but makes the files a bit noisy.
  3. Add a top-level #mypy: disable-error-code="no-redef" comment. This approach is probably the least noisy but possibly suppresses other no-redef errors in the file.

Alternative

No response

Additional context

As far as I know, there is no way to configure mypy to ignore a certain error in specific files (like per-file-ignores config in Ruff) via configuration files such as pyproject.toml. So it's impossible to suppress the no-redef errors in marimo notebooks without modifying the notebook files.

So my current workaround is to take approach 3 and manually adding the top-level #mypy: disable-error-code="no-redef" comment to every marimo notebook.

dmadisetti commented 2 months ago

Related is the idea of a marimo specific lint: https://github.com/marimo-team/marimo/issues/1543

Let us know if you come across any other type checking/ static analysis issues particular to mypy and marimo

mscolnick commented 2 months ago

@shunichironomura - I dont think this is an alternative or solution, but you can rename cells (in the cell dropdown menu), which is the name given in the generated python file

shunichironomura commented 2 months ago

Related is the idea of a marimo specific lint: #1543

Let us know if you come across any other type checking/ static analysis issues particular to mypy and marimo

Another mypy/marimo issue that I encountered is that mypy cannot infer the module imported in a different cell. Here's a minimal example:

import marimo

__generated_with = "0.8.15"
app = marimo.App(width="medium")

@app.cell
def __():
    import pathlib
    return pathlib,

@app.cell
def __(pathlib):
    def read_file(file: pathlib.Path) -> str:
        return file.read_text()
    return read_file,

if __name__ == "__main__":
    app.run()

Running mypy against this file gives:

notebooks/mypy_issue.py:13: error: Name "__" already defined on line 7  [no-redef]
notebooks/mypy_issue.py:15: error: Name "pathlib.Path" is not defined  [name-defined]
Found 2 errors in 1 file (checked 1 source file)

mypy doesn't report error if it's imported via from pathlib import Path instead, but in that case the type of file is inferred as Any:

import marimo

__generated_with = "0.8.15"
app = marimo.App(width="medium")

@app.cell
def __():
    from pathlib import Path
    return Path,

@app.cell
def __(Path):
    def read_file(file: Path) -> str:
        reveal_type(file) # Any
        return file.read_text()
    return read_file,

if __name__ == "__main__":
    app.run()

Still, there are cases where I want to import the module (e.g., numpy, networkx).

My current workaround is to modify the cell like this:

from typing import TYPE_CHEKCING

if TYPE_CHECKING:
    import pathlib as _pathlib

def read_file(file: _pathlib.Path) -> str:
    return file.read_text()

Maybe this is a known issue, I haven't searched the issues, but let me know if you want to open a new issue for this.

shunichironomura commented 2 months ago

@shunichironomura - I dont think this is an alternative or solution, but you can rename cells (in the cell dropdown menu), which is the name given in the generated python file

Oh I didn't know about that, thank you. I agree that it cannot be the solution, but it's good to know.

shunichironomura commented 2 months ago

Another option is to make a mypy plugin for marimo? I don't know much about what's possible with mypy plugins, but it may solve these issues without changing the generated Python files.

patrick-kidger commented 2 months ago

Whatever is done here, do bear in mind that there are other static type-checkers beyond mypy; most notably pyright. (Which personally I much prefer, as it seems to have far fewer edge cases.)

If I had to give my +1 to something, it would be to give each cell a unique name. This would also avoid trigger spurious Ruff lints like N807.

wasimsandhu commented 2 weeks ago

Wanted to bump this issue. This issue is causing our CI/CD to fail (with both mpy and pyright). Renaming cells is a solution, but manual and cumbersome when you have many cells in a notebook.

Is there a reason cells can't simply be serialized in the file as cell_1, cell_2, etc.?

nambrosini17 commented 1 week ago

Thank you for creating marimo and congratulations for the seeding round! The experience has been pleasant. As one of the selling points of marimo is better git tracking, i think an important part of it is that it inserts itself seemlessly into git workflows. Many of python devs (target audience of marimo) already use mypy/other linter to track files. So removing this friction will likely satisfy a large part of the target users. I've read the 2 issues in which this behaviour is discussed and i think assigning names with a __ prefix should be enough to avoid naming collisions. In fact it seems you've already decided that when assigning __generated_with = "0.x.y" to notebook version so this new cell naming convention would be consistent with that.