python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.48k stars 2.83k forks source link

[Feature Request] Option to Recurse into All Subdirectories with Namespace Packages #6385

Closed jtschoonhoven closed 3 years ago

jtschoonhoven commented 5 years ago

Feature Request An option to recursively type-check all .py files under a given directory when using namespace_packages (without __init__.py).

Use Case Given the following directory structure:

my-py3-repo/
├── hello/
│   ├── services/
│   │   └── hello_service.py
│   └── hello.py
├── scripts/
│   ├── db/
│   │   └── migrate.py
│   └── manage.py
└── tests/
    └── hello/
        ├── services/
        │   └── hello_service_test.py
        └── hello_test.py

Assuming that hello.py imports hello_service.py, everything under the hello namespace will be type checked as expected with mypy ./hello.

However test discovery with pytest, nose, django et al works differently and hello_test.py would not usually import hello_service_test.py. There is currently no way for Mypy to discover hello_service_test.py with mypy ./tests (if not using __init__.py).

Similarly, everything under the scripts directory would suffer the same problem.

If Mypy supported a --recursive -r option (or similar) that would cause it to automatically recurse into subdirectories, this would solve these common use cases.

Why not just use __init__.py? To quote iScrE4m's comment from https://github.com/python/mypy/issues/1645#issuecomment-302948133_,

In future, there will be python programmers who never heard of pre PEP420 era of __init__.py and that's good. What's not good is mypy forcing them to create dummy files.

Configuration

# Pipfile
[dev-packages]
mypy = "==0.670"
[requires]
python_version = "3.7"
# setup.cfg
[mypy]
python_version = 3.7
ignore_missing_imports = True
namespace_packages = True
JohnHBrock commented 5 years ago

Which of the following would be a better solution: (1) changing the behavior of --namespace-packages so that it recurses into all subdirectories, or (2) adding a new flag (such as --recursive) that has the same effect as --namespace-packages, except it recurses into all subdirectories?

My vote is for option 1 just to keep things simple by having as few command line args as necessary, and because I find the current behavior of --namespace-packages unintuitive (I was surprised to discover I have a repo where some .py files weren't getting checked by mypy simply because they were too deep in a directory hierarchy without __init__.py files). I'm guessing there are other projects out there right now where people are running mypy foo/, not getting any errors, and therefore assuming everything is fine with their type annotations, when in reality they have .py files in the hierarchy under foo/ that aren't getting checked.

Of course, the flip side is that changing the current behavior of --namespace-packages will break some people's mypy/CI scripts when they update mypy (e.g., in @jtschoonhoven's example above, someone might be running mypy ./tests and truly want it to skip hello_service_test.py).

gvanrossum commented 5 years ago

Like everything with Python import, stuff here is more complicated than it seems.

Mypy has two different ways of finding files: one for the initial set off target files specified on the command line, and a separate set of rules for following imports.

The --namespace-packages flag applies to following imports only. But here we’re concerned with the file targets specified on the command line.

I think it’s actually a bug that when you specify a directory on the command line it doesn’t really recurse into that directory, but does a shallow directory listing.

I propose to fix that bug and replace the shallow directory listing with a proper walking of the directory hierarchy starting at that point.

Note that the -p or --packages flag is still different: its argument is not a directory relative to the current directory, but specifies a search in the initial $MYPYPATH, so that’s unrelated.

JohnHBrock commented 5 years ago

I see, thanks for the clarification. I'll work on a PR that properly walks the directory hierarchy for targets specified on the command line.

Coder-Sharon commented 5 years ago

+1

I've been hoping for mypy to recursively traverse the given directory for some time. I look forward to this fix!

JohnHBrock commented 5 years ago

I'm investigating what it will take to make this change and found a problem:

SourceFinder.expand_dir() uses crawl_up_dir() to figure out the top module and base directory for the package (link to code). crawl_up_dir() works by recursively checking the parent directory until it finds one without an __init__.py file.

How should mypy determine the base directory and top module for namespace packages since there are no __init__.py files for crawl_up_dir() to rely upon?

gvanrossum commented 5 years ago

Hm, looking more at that code it seems that expand_dir() does recurse down subdirectories, but only if they have __init__.py files (there's a recursive invocation on L87 that will return [] immediately if there isn't one, see L67-L69). That's a bit inconsistent since it doesn't insist on a __init__.py file for the toplevel directory.

This puts us in a difficult position. The current code seems to be treating the toplevel directory as a member of $MYPYPATH, and every module it finds is rooted there. For example, if we have this directory:

foo/
 +---- bar.py
 +---- baz/
        +---- __init__.py
        +---- yo.py

and we run mypy foo, then the global module namespace contains bar, baz and baz.yo.

And if we wanted to simply extend this to namespace packages (probably only when using --namespace-packages), we would expect to obtain the same hierarchy if we left out foo/baz/__init__.py.

But an alternative interpretation might be that foo itself would be a namespace package, and then we'd expect the global module namespace to have foo, foo.bar, foo.baz, and foo.baz.yo. (Same as if there was also a file foo/__init__.py.)

So which should it be? I'm not sure.

It might be easier to expand the meaning of -p when combined with --namespace-packages to have the second interpretation, so that mypy --namespace-packages -p foo would use the second interpretation above, regardless of whether foo/__init__.py exists. (BTW: It actually does this when foo/__init__.py is absent, but when that file exists, it seems to do something different. Worth investigating.)

But this doesn't provide exactly the same functionality: the argument to -p must be a valid module name, whereas when running mypy <directory> the directory could contain slashes.

Oh wait: another difference is that with mypy -p <package> only the package's __init__.py and anything it imports is checked, while with mypy <directory> all files in the directory are checked. But with --namespace-packages, -p seems to also dive into subdirectory. I don't have time to investigate all this -- clearly I've let the complexity of this code get away from me, and it will take some time to sort it out.

(In the meantime, a useful diversion might be to implement #7672 first.)

aldanor commented 4 years ago

@JohnHBrock wondering if there any news or any help needed?

In the current implementation, mypy pretty much discourages users from going with pep420; I personally had to go and add a bunch of those pesky empty __init__.py files to existing projects just so they could be mypy'd.

One other use case to note, for namespace packages it's not uncommon to place the source code under src/ folder, so that setuptools.find_namespace_packages() is easier to use without having to provide explicit globs (by setting package_dir={'': 'src'}).

This way, you could have something like

src/
+-- foo/
   +-- bar.py
   +-- baz/
      +-- __init__.py
      +-- yo.py

and, ideally, one would expect things to "just work" by running

env MYPYPATH=src mypy --namespace-packages -p foo 

which would then traverse foo, foo.baz and foo.baz.yo.

JohnHBrock commented 4 years ago

@aldanor No updates on my end. It's not clear to me how to proceed until the questions raised by Guido above are resolved.

DBCerigo commented 4 years ago

We are currently suffering from silently* not type-checking project files when someone forgets** to put in the required __init__.py file in a dir/module they have created.

A recursive option would solve this issue for us.

*"Almost silently" might be more exact; a very observed dev could see that the number of files checked hadn't increased since before they added new modules/files - or just assumed that they must be being ignored because mypy hadn't raised any errors and they are unlikely to code perfectly first time... **Or more usually, they choose not add it, trying to follow pep420.

sfdye commented 4 years ago

Mypy also doesn't seem to type check __init__.py when directory is passed from command line.

$ mypy github
Success: no issues found in 95 source files

VS

$ mypy github/*.py
github/__init__.py:70: error: Type of __all__ must be "Sequence[str]", not "List[object]"
jamesbraza commented 4 years ago

A workaround (adapted from here):

find . -type f -name "*.py" | xargs mypy 
ehossack commented 4 years ago

@jamesbraza unfortunately this doesn't work universally. If you have two files with the same name (e.g. Django project):

path/to/urls.py: error: Duplicate module named 'urls' (also at './path/elsewhere/urls.py')
stephan9mertel commented 4 years ago

Hi, I just stumbled over the same issue and found this request. Since it's open since more than 1 year I guess I shouldn't expect it to be included any time soon, should I?

gvanrossum commented 4 years ago

There’s another issue about this that promises to fix this. Can you find it for me?

stephan9mertel commented 4 years ago

I looked through the open pull requests but couldn't spot anything that seemed to be a fix for this. In the open issues I could only find #8548, however it's not clear to me whether it demands to fix the documentation or the code.

gvanrossum commented 4 years ago

In https://github.com/python/mypy/issues/8548#issuecomment-699569080 I meant to say that we should fix the code to actually recurse looking for .py files.

ehossack commented 3 years ago

Just to clarify, is the other issue actually the same here? Now that #8548 is merged, I think I still do not experience recursing?

❯ mypy --version
mypy 0.800+dev.260ac5fda39c0b0314fe85af2c18c4e25195a155
❯ mypy django-project --namespace-packages
Success: no issues found in 150 source files
❯ mypy django-project/utils.py
django-project/django-project/models/statusresult.py:23: error: No overload variant of "list" matches argument type "Type[StatusResult]"
django-project/django-project/models/statusresult.py:23: note: Possible overload variant:
django-project/django-project/models/statusresult.py:23: note:     def [_T] list(self, iterable: Iterable[_T]) -> List[_T]
django-project/django-project/models/statusresult.py:23: note:     <1 more non-matching overload not shown>
django-project/utils.py:29: error: No overload variant of "filter" matches argument types "str", "List[<nothing>]"
django-project/utils.py:29: note: Possible overload variants:
django-project/utils.py:29: note:     def [_T] filter(None, Iterable[Optional[_T]]) -> Iterator[_T]
django-project/utils.py:29: note:     def [_T] filter(Callable[[_T], Any], Iterable[_T]) -> Iterator[_T]
django-project/utils.py:31: error: Incompatible return value type (got "str", expected "LogEntry")
Found 3 errors in 2 files (checked 1 source file)

(perhaps @hauntsaninja has context given these questions)

hauntsaninja commented 3 years ago

There are several overlapping issues here. Recursing is currently "fixed" on master (in theory :-) ), but it might not yet do what you want for namespace packages (see https://github.com/python/mypy/pull/9632, also mentioned on this thread are issues fixed by https://github.com/python/mypy/pull/9683).

@ehossack To confirm what's going on, could you share the output of mypy django-project --namespace-packages -v 2>&1 | grep Found? That will confirm what build sources mypy finds. Do you have --ignore-missing-imports on? Is your project open source / do you have a standalone repro?

ehossack commented 3 years ago

Sure, see https://gist.github.com/ehossack/5ec1113cc74e03adda3d858e4f7293eb

--ignore-missing-imports shouldn't be on.

[mypy]
mypy_path = .mypy_stubs
namespace_packages = True
disable_error_code = misc
plugins =
    mypy_django_plugin.main

[mypy.plugins.django-stubs]
django_settings_module = "django-project.settings"

[mypy-pytest.*] ; https://github.com/pytest-dev/pytest/issues/3342
ignore_missing_imports = True

Sorry it's not open source, and I can't share the code. On a similar project I'm trying to reproduce this, I keep getting helpful errors such as the following 🤣:

So maybe I don't properly understand what command I'm supposed to use if I want to ask mypy to "please check this and all subdirectories for all python files and ensure their typing is compliant" and thus, I apologize, and am probably commenting on the wrong ticket.

On my reproduction project (see here), I'm running (in order of various commands to try and understand things):

❯ mypy --namespace-packages .
Success: no issues found in 33 source files
❯ mypy --namespace-packages sample_app
sample_app/manage.py: error: Source file found twice under different module names: 'models' and 'sample_app.b2_file_app.models'  [misc]
Found 1 error in 1 file (errors prevented further checking)
❯ mypy -p sample_app --namespace-packages
sample_app/b2_file_app/views.py:23: error: Item "None" of "Optional[ModelWithFiles]" has no attribute "refresh_from_db"  [union-attr]
Found 1 error in 1 file (checked 19 source files)
hauntsaninja commented 3 years ago

The only foolproof way with mypy 0.790 and current master is to add __init__.py in all your packages and subpackages. This is a shortcoming I'm working on and should be substantially improved by 0.800, but I can't promise that things will make sense (or any level of support) for an unreleased mypy. In fact, for namespace packages, things won't make sense until #9632 and #9683 are merged.

From your gist, it looks like #9614 is doing what it is intended to do; it's not intended to handle namespace packages (which is why this issue and #5759 are still open). #9632 is the PR that aims to give you a way to do what you want.

With that said, I'm somewhat confused by what other parts of mypy are doing in your gist. I'd expect mypy's build to raise a Duplicate module named 'utils' error. If you're willing to update the gist to the output of:

rm -rf .mypy_cache
mypy django-project -v --no-incremental
mypy django-project --namespace-packages -v --no-incremental
mypy django-project/utils.py -v --no-incremental
mypy django-project/utils.py --namespace-packages -v --no-incremental

I could try and figure out what's going on there.

ehossack commented 3 years ago

Thanks @hauntsaninja for the explanation! Let's leave the discussion at this for now if we may, and keep the thread focused for others who might find it in the future, rather than going into my mypy project setup.

For the record, I think I suppressed the error according to this thread setting the disable_error_code = misc flag.

atreyasha commented 3 years ago

A workaround (adapted from here):

find . -type f -name "*.py" | xargs mypy 

FWIW, I am using this current workaround for mypy to type-check all python files in a namespace-package scenario without any __init__.py or __init__.pyi files:

Repository structure:

mypy-repo/
├── mypy.ini
└── src
    ├── arg_parser.py
    ├── preprocess_multiclass_nlu.py
    ├── soft_patterns_pp.py
    ├── train.py
    └── utils
        ├── data_utils.py
        ├── logging_utils.py
        ├── model_utils.py
        └── parser_utils.py

mypy.ini:

# Global options:
[mypy]
allow_redefinition = True
namespace_packages = True
no_incremental = True

Workaround command executed at the root of the repository:

find src -type f -name "*.py"  | sed 's|/|.|g; s|\.py||g' | xargs -t -I {} mypy -m {}
kpfleming commented 3 years ago

I'm glad I found this open issue (and all the linked ones), after (as a brand-new user) being completely unable to get mypy to analyze all of my source files, because I too am using the src-layout and namespace packages (code is targeted at Python 3.6+ and is located two levels deep under a 'company' namespace in which I cannot include an __init__.py file). For now, since all of my source is in that third-level directory, I'll just point mypy directly at it.

hauntsaninja commented 3 years ago

Yeah, some of mypy 0.790's behaviour here is pretty bad. But this is fixed in master, specifically by #9742 (and related fixes in other PRs).

For the following layout:

.
├── src
│   └── namespace_pkg
│       ├── asdf.py
│       └── ...
└── tests
    └── ...

Using mypy master, you can get it to check everything with:

MYPYPATH=src mypy . --explicit-package-bases --namespace-packages

Explanation: when passing files (as opposed to packages or modules), mypy will crawl upwards as long as there's an __init__.py to determine fully qualified module names. With --explicit-package-bases, it'll keep crawling till it hits something in MYPYPATH or the current directory.

You could also get it to check namespace_pkg using:

MYPYPATH=src mypy -p namespace_pkg --namespace-packages

Hopefully by the next release we can make --namespace-packages the default (#9636), and things will be a lot friendlier for users, new and old.