python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.19k stars 2.78k forks source link

Errors not always reported for a file in incremental mode (reopening #4043) #12354

Open mrolle45 opened 2 years ago

mrolle45 commented 2 years ago

Bug Report Although #4043 has been fixed by #4045, it has adverse impact on performance. It results in multiple modules being checked unnecessarily.

To repeat the example from #4043:

# a.py
import b
b.f()

# b.py
def f() -> None: pass

# b.py (2)
def f(x) -> None: pass

In my own project, I have 27 modules in a single SCC, and mypy takes 26 seconds to analyze if any of them has reported errors. Same if I run mypy again without changing any of the modules. mypy is deleting all 27 cache files! By the way, if I turn off error reporting on all these modules, then mypy does not delete the cache files. If I then enable error reporting on any one of them which has errors, mypy correctly reports the errors, using the existing cache files, but still deletes the cache files afterward.

I assert that #4045 is the wrong solution, and the cache files should not be deleted in this case. You can remove the following code from build.py\write_cache():

        is_errors = self.transitive_error
        if is_errors:
            delete_cache(self.id, self.path, self.manager)
            self.meta = None
            self.mark_interface_stale(on_errors=True)
            return

This is the only use of transitive_error, so you can remove all references to that elsewhere in the code.

Experiment

I removes the above mentioned code, and added a line if 'a' in scc: fresh = False, which treats module a as stale. I repeated the calls to mypy cited in #4043, and this time got the expected results every time.

Proposed Fix Let Sa be the State for module a, and Sb be the State for module b (if it exists). Sa.is_fresh() needs to recognize that it is out of date with respect to Sb. Since both modules could be in the same SCC, one meta file will be later than the other, so using mtime is not the answer. Instead, each meta file Sa should keep some data about each Sb at the time Sa was stored, so that it can detect if this is different from the current Sb in the filesystem. This could be a hash of the metadata, excluding the information for the Sb's.

Your Environment

henzef commented 2 months ago

This seems to be still an issue and slows down mypy a lot on my codebase.