pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
82.1k stars 22.07k forks source link

[typing] Add missing `__all__` to modules. #131765

Open randolf-scholz opened 1 month ago

randolf-scholz commented 1 month ago

🚀 The feature, motivation and pitch

torch ships with the py.typed marker, yet many modules lack a __all__ declaration. This leads to problems with type checkers:

from torch import optim
opt_type = optim.Adam  # pyright: error: "Adam" is not exported from module "torch.optim"

According to the typing spec:

  • Imported symbols are considered private by default
  • A module can expose an __all__ symbol at the module level that provides a list of names that are considered part of the interface.

Therefore, modules that reexport symbols such as https://github.com/pytorch/pytorch/blob/main/torch/optim/__init__.py need to declare __all__.

cc @vincentqb @jbschlosser @albanD @janeyx99 @crcrpar @ezyang @malfet @xuzhao9 @gramster

malfet commented 1 month ago

Thank you for reporting. If you already have a fix, please do not hesitate to propose a PR

albanD commented 1 month ago

Note that our current public API definitely doesn't require __all__ to be defined https://github.com/pytorch/pytorch/wiki/Public-API-definition-and-documentation

randolf-scholz commented 1 month ago

@albanD Well, as I said, the problem with this is it violates assumptions made by 3rd party tools such as pyright, based on the official typing specification.

randolf-scholz commented 1 month ago

@malfet I made a fix #131800. It's rather large as I updated over 150 __all__ definitions.

albanD commented 1 month ago

I understand that pyright does that but the typing specifications you shared above don't actually make that claim? Is pyright more strict than the general CPython typing module rules? If so, do these properly align with other typecheckers? We don't use pyright AFAIK and use mypy/black/flake8/ufmt. Will this conflict with them?

ezyang commented 1 month ago

@albanD pyright is very popular downstream so we should support them anyway even if we don't use it internally

albanD commented 1 month ago

@randolf-scholz can you share how you get this error btw, installing pyright and running it on the code you shared gives me 0 errors, 0 warnings, 0 informations.

randolf-scholz commented 1 month ago

@albanD Not sure why it doesn't show up for you; I created an example repro here: https://github.com/randolf-scholz/torch_131765

And this is how it looks like if you open the project in VSCode with the pylance extension and Python › Analysis: Type Checking Mode set to standard:

Screenshot from 2024-08-01 13-58-23

randolf-scholz commented 1 month ago

I simplified my main PR and split up some smaller edits into separate PRs:

albanD commented 1 month ago

Ho I guess because we already added it in https://github.com/pytorch/pytorch/blob/6c1f1563e1b2fa54673033a5aed8fabd9fb28bc2/torch/optim/__init__.py#L44 ?

Also I can't seem to find the exact definition of what pyright considers public or not. Do you know where that is by any chance? The typing spec you shard above aligns with what we do by saying that __all__ is optional and, when not provided, the other rules apply (they define these extra rules based on static code of import while we define them based on runtime object module attribute).

It would be great to get the exact specs of what we need to comply with so we can properly update our rules and lint to make sure we continue following them!

randolf-scholz commented 1 month ago

@albanD Yes, it seems like @ringohoffman already patched optim/__init__.py in https://github.com/pytorch/pytorch/pull/131959 last week.

pyright follows the typing spec here. However, reading it again, it seems like the wording could be improved. What the spec says is that:

  1. If there is no __all__, then
    1. star-imports from module import * marks all those imports as public / re-exported
    2. other imports are only considered public if they do not start with and underscore and are redundantly renamed. For example import A as A, from package import A as A, from . import A as A and from .submodule import A as A all mark A as public.
    3. Inside __init__.py files, statements like from .submodule import X additionally mark submodule as public (Hence all the del statements in optim/__init__.py. These could be avoided if absolute imports were used instead: from torch.optim.adam, then adam is not considered public by the typing spec. This was patched in #127703 already.)
  2. If __all__ is declared, it defines the public interface.

On the other hand, the torch spec says that all objects with a __module__ attribute that starts with torch. are considered public.

In my example repo, if one replaces

from .adam import Adam

with

from .adam import Adam as Adam

then pyright considers Adam public.

randolf-scholz commented 1 month ago

pylance also adds some visual highlighting clue in this case:

image

Notice how the other entries are greyed out as they are considered unused, but the redundantly renamed one not (despite being unused otherwise), as this marks it as a re-export.

albanD commented 1 month ago

Thanks for the details!

Yes the any module that starts with torch. is to make the transition easier as there are many places where we do re-export in parent namespaces. Also this helps appease sphinx doc coverage (that we use) where it will consider an API to be public if its __module__ is the current module. If we do from .adam import Adam as Adam, does that mean that sphinx will see two public APIs (that must be documented): torch.optim.Adam and torch.optim.adam.Adam ?