pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.27k stars 6.96k forks source link

Type annotations #2025

Closed pmeier closed 3 months ago

pmeier commented 4 years ago

🚀 Feature

Type annotations.

Motivation

Right now, if a project depends on torchvision and you want to static type check it, you have no choice but either ignore it or write your own stubs. torch already has partial support for type annotations.

Pitch

We could add type annotations for torchvision as well.

Additional context

If we want this, I would take that up.

fmassa commented 4 years ago

Yes, please! Python-3 type annotations to torchvision would be great!

pmeier commented 4 years ago

Two questions:

  1. Should I stick to stubs, i.e. create an .pyi file for every .py file, or do we want the annotations inline?
  2. Do you want me to do this in one gigantic PR or should I split it by module or package?
fmassa commented 4 years ago

@pmeier 1 - let's do the annotations inline. I think the reason PyTorch used the .pyi file for the annotations was due to Python2 compatibility (but I might be wrong). @t-vi do you have thoughts on this? 2 - let's split it in smaller chunks which are easier to review. It doesn't even need to be the full module at once, for example transforms is very big, can we can send a few functions at a time.

pmeier commented 4 years ago

Type annotations ToDo

pmeier commented 4 years ago

I don't know if there is already an internal discussion about this, but in the light of inline type annotations I think we really should talk about the code format. With type annotations the signatures get quite big and with in our current style this not only looks ugly but also hinders legibility:

def save_image(tensor: Union[torch.Tensor, Sequence[torch.Tensor]], fp: Union[str, io.FileIO, io.BytesIO],
               nrow: int = 8, padding: int = 2, normalize: bool = False, range: Optional[Tuple[int, int]] = None,
               scale_each: bool = False, pad_value: int = 0, format: Optional[str, io.FileIO] = None) -> None:

The same signature formatted with black looks like this

def save_image(
    tensor: Union[torch.Tensor, Sequence[torch.Tensor]],
    fp: Union[str, io.FileIO, io.BytesIO],
    nrow: int = 8,
    padding: int = 2,
    normalize: bool = False,
    range: Optional[Tuple[int, int]] = None,
    scale_each: bool = False,
    pad_value: int = 0,
    format: Optional[str, io.FileIO] = None,
) -> None:

which (subjectively) looks better and is more legible.


Note that I'm not advocating specifically for black, but rather for a code formatter or more general a unified code format.

t-vi commented 4 years ago

I'm probably not the best person to ask because I never liked the pyi files for Python-implemented bits in the first place. That said, my understanding is that think it's fine to do them inline now.

Regarding the formatting: I do agree that cramming as much as possible in each line probably isn't the best way. You might see if PyTorch has a formatting preference in its formatting checks and what's the generally recommended Python way (I never know).

fmassa commented 4 years ago

Thanks for the input @t-vi!

@pmeier good point about the code getting messy with type annotations. I don't have a good answer now, although indeed the output of black does seem more readable.

Let me add @cpuhrsch @vincentqb and @zhangguanheng66 for discussion as well

vincentqb commented 4 years ago

I second the use of black-style inline type hint formatting.

zhangguanheng66 commented 4 years ago

Yes. black makes sense to me as well.

pmeier commented 4 years ago

This would also make reviewing easier. If you have a look at this review comment is unclear at first glance which parameter is meant since GitHub only allows to review a complete line.

fmassa commented 4 years ago

Although black is a nice formatter, I would be careful with re-formatting the whole codebase at once -- this effectively messes up with blame so that we don't know anymore what was added when and by who.

pmeier commented 4 years ago

The author talked very briefly about integrating black into an existing codebase. I've had no contact with blame so its hard for me to asses if hyper-blame could work for us. Could you have a look?

fmassa commented 4 years ago

The problem is that most of the users (including Github UI) most probably only uses git blame, so asking it to be changed to hyper-blame is a bit hard

pmeier commented 4 years ago

As we will touching every signature while adding annotations we could simply format them with black and leave everything else as is. This way git blame should still work, but we would enhance legibility. If we want to adopt black for the complete codebase we could do so by one commit at a time later.

fmassa commented 4 years ago

@pmeier I think we could adopt the function-signature style of black, but leave the rest of the code-base unchanged? If that's what you proposed, I'm ok with it!

frgfm commented 3 years ago

From what I gathered, the current status is:

oke-aditya commented 3 years ago

Bonjour @frgfm ! I will too try to take this up :smile:

NicolasHug commented 3 years ago

@frgfm @oke-aditya thanks a lot for your recent PRs

I just wanted to point out something: we should only add types to files that we can effectively check types for. Otherwise, we risk having having incorrect type annotation, which is much worse than not having annotations in the first place.

So, when submitting a PR, please make sure that this file is properly covered by mypy.ini (which you may need to edit) so that it can be checked by the CI

pmeier commented 3 years ago

@frgfm @oke-aditya

Thanks a lot for new interest in typing. Unfortunately, I'm out for the next two weeks. @fmassa agreed to chip in for me for the reviews.

I mentioned it on some PRs, but will do it again here for visibility: since torch.jit only supports a subset of types, there will be situations were you cannot make it and mypy happy. For example, currently Union and Any is not supported. In these situations we always side with torch.jit. That means, we need to put # type: ignore comments on the offending lines to silence mypy.

My usual approach is to add annotations until mypy is happy (keep https://github.com/pytorch/vision/issues/2025#issuecomment-889989524 in mind). Afterwards you can run the test suite to see if this introduced any torch.jit failures. If yes, change the offending annotations so that the test suite is happy and silence mypy accordingly.

NicolasHug commented 3 years ago

since torch.jit only supports a subset of types, there will be situations were you cannot make it and mypy happy

I'm sorry to be that guy again :/ ...

But isn't this a major problem? The goal of this issue is to make torchvision type-checkable, but as you noted there are a bunch of places where we have to give priority to torchscript, and keep mypy unhappy.

Typically, that means that if a function accepts Union[int, tuple[int]] we have to type it as just tuple[int]. But that means that when a user pass ints to this function (which is perfectly fine), they won't be able to type check their code because it's not mypy compliant. Or when they do type-check, they'll get an error that they will have to explicitly ignore.

In those cases, and they are fairly frequent in torchvision, the type checking will get in their way instead of helping them writing better code.

I totally get the necessity of type annotations for torchscrip support, but I am wondering whether typing the rest of torchvision is something that will actually be useful, considering these constraints. I'm happy to hear what others think though.

oke-aditya commented 3 years ago

Some words from fmassa https://github.com/pytorch/vision/pull/2854#issuecomment-714321576

Yes.

The situation with torchscript is delicate, and if we need to chose between mypy and torchscript, torchscript wins.

In general, all torchvision models already have type annotation in the forward because they all support torchscript.

The difference with mypy is that torchscript assumes that

if no input type is passed to an argument, it's assumed to be a tensor
return types are inferred

So if you want to play safe, I would say to only add type annotations to the constructor of the models, or the model builder functions.

This way it won't affect torchscript, but mypy won't probably be happy either.

Which means that it might be hard to enable full mypy testing to the models.

pmeier commented 3 years ago

@NicolasHug

I totally get the necessity of type annotations for torchscrip support, but I am wondering whether typing the rest of torchvision is something that will actually be useful, considering these constraints. I'm happy to hear what others think though.

That is a valid concern. I agree, type annotations are probably less helpful for other library developers if they are wrong or at least not complete.

Since the restrictions of torchscript are temporary (for example there is an approved PR for Union support pytorch/pytorch#53180) the problem will vanish in the future though. Thus, we might delay the addition of the py.typed file (and thus flagging torchvision as typed) until these issues are resolved, but we probably will add annotations anyway in the future.

NicolasHug commented 3 years ago

I agree that this might just be temporary. But until torchscript fully supports mypy-compatible types, I would suggest to put the current typing efforts on hold. The current torchscript constraint forces us to write incorrect annotations (which does more harm than good when it comes to type validation), and we currently have no automatic way to mark / detect those, so it would be best not to add more incorrect hints IMO.

datumbox commented 3 years ago

My 2 cents are the following:

NicolasHug commented 3 years ago

Thanks for your comment @datumbox , I will try to reply below. First, let me point out a few things, in order to explain where I'm coming from:

Regarding the improved readability of typed code: I will agree that there is a degree of subjectivity here, but IMHO, I don't find this to be the case. For readability, there's not much that an annotation will bring compared to a decent docstring. And those annotations are intended to us: torchvision developers, not users (for now). I completely agree with you that the signatures in https://github.com/pytorch/vision/pull/4229/files are unclear and need to be improved; but to me this is better solved by clarifying / adding docstrings than by adding annotations.

Taking the example of https://github.com/pytorch/vision/pull/4224 that you mentioned, to be fair, I don't see how this is a net positive for the library. Those annotations are either redundant with the docstrings, or really obvious [2]. In some cases mypy is being extra picky / difficult and requires a fair amount of change for very little added value (https://github.com/pytorch/vision/pull/4171#issuecomment-880672707 is an example). Sometimes, mypy is just plain wrong as well, and it significantly raises the barrier to entry when it comes to contributions [3].

I don't share the point of view that reviewing typing PRs is a low reviewing effort: like documentation PRs, typing PRs require extra vigilance and very thorough checking, as an incorrect annotation is much more harmful than a lack of annotation. On top of that, we'll get a lot of typing PRs, leading to a quite significant effort overall.

I also fully agree about keeping the community engaged. As you know, I've been pushing strongly towards this myself. Personally, I believe there's a lot of potential towards having better docstrings (or having docstrings at all in a lot of cases :p ).


[1] a simple example of this is this code, for which annotations are completely wrong, and yet mypy is perfectly happy:

def _g(a):
    return a + 0.5

def f(a : int) -> int:
    return _g(a)

[2] I want to emphasize that my comments here do not undermine the quality of the contributions from @frgfm or @oke-aditya and the rest of our contributors. I'm only criticizing the mypy / type annotations value, not the contributions or the contributors, which we are very grateful for.

[3] On a PR I submitted to pandas, I got into this:

def return_4() -> int:
    l = [2, 4, 6]
    assert 4 in l   # basically something that you *know* is True for at least one element in l

    def cond(x):
        return x == 4

    for x in l:
        if cond(x):
            return x

mypy complains with

lol.py:1: error: Missing return statement  [return]
    def return_4() -> int:
    ^
Found 1 error in 1 file (checked 1 source file)

Wanna make mypy happy? easy, just add

raise StopIteration("This thing never gets raised anyway")  # or whatever exception

at the very end of the code, without changing anything else. Ironically, we didn't change the return, but mypy is still happy. It's also a net negative as this exception will never be raised and we're decreasing test coverage.

As another workaround, we can write something like

def return_4() -> int:
    l = [2, 4, 6]
    assert 4 in l   # basically something that you *know* is True for at least one element in l

    def cond(x):
        return x == 4

    return next(x for x in l if cond(x))

Which is arguably more pythonic, but also less readable for a beginner. Also, this form doesn’t make it more obvious at all that the condition is always hit for a reader (if we ignore the obvious assert above). This is the kind of stuff that makes me believe more typing = much greater contribution barrier. This stuff really isn't obvious for a non-experienced dev IMHO.

datumbox commented 3 years ago

Thanks for the comments @NicolasHug.

The goal of this issue is for our users to be able to type-check their code. This won't happen until we have a proper py.typed file, and this won't be possible until all of torchvision is properly typed.

Adding types to our code-base will take time and will require significant effort which we can start early. I agree that JIT, ONNX and mypy can be give us hard time sometimes and certainly some places will require too many workaround with dubious benefits initially. My proposal is to merge those PRs which have a net positive on readability such as the ones mentioned earlier.

I will agree that there is a degree of subjectivity here, but IMHO, I don't find this to be the case. For readability, there's not much that an annotation will bring compared to a decent docstring.

I think using typing annotations is a much more natural way to encode this information and additionally it future proofs our code-base. Having said that, I too acknowledge that the notion of readability has some subjective parts. The same applies for what one considers obvious.

I don't share the point of view that reviewing typing PRs is a low reviewing effort: like documentation PRs, typing PRs require extra vigilance and very thorough checking, as an incorrect annotation is much more harmful than a lack of annotation.

I definitely agree on the fact that incorrect/misleading annotations (or docstrings) are harmful. Personally I don't mind reviewing PRs that introduce typing where it makes sense, provided they use good judgement. The maintainers listed here are experienced and given the above clarifications they can apply their judgement on where typing is beneficial and where it will cause more confusion than help.

NicolasHug commented 3 years ago

My proposal is to merge those PRs which have a net positive on readability such as the ones mentioned earlier.

Well, I'm trying to argue above that IMO, these aren't net positives.

I think using typing annotations is a much more natural way to encode this information

I quite firmly disagree here. First, annotations aren't part of the docstring, so in the html docs, they're not rendered next to the parameters description. Second, annotations are meant to be machine-readable, not human-readable. Let's take the colors parameter for example: https://github.com/pytorch/vision/blob/master/torchvision/utils.py#L144:L144

The annotation for it is Optional[Union[List[Union[str, Tuple[int, int, int]]], str, Tuple[int, int, int]]]. Its description is:

List containing the colors or a single color for all of the bounding boxes. The colors can be represented as str or Tuple[int, int, int].

I would find it hard to argue that the annotation is clearer than the human description :)

datumbox commented 3 years ago

@NicolasHug I understand that you prefer docstrings over typing annotations, that you really don't like mypy and that you prefer a less explicit approach for defining types. This is clearly documented here and on your previous comments on other PRs.

The reason why I share my opinion here is because I want to engage with all the contributors and solicit their feedback. The decision of stop adding typing annotations in favour of docstrings is quite major and can't be made without quorum. This is especially true when the matter boils down to code readability and aesthetics where there is a certain degree of subjectivity.

pmeier commented 3 years ago

@NicolasHug and I had a similar offline discussion a while back and I want to just add my two cents to the examples:

def _g(a):
    return a + 0.5

def f(a : int) -> int:
    return _g(a)

The problem stems from the fact that you are mixing typed and untyped definitions here. In these cases, mypy is very lenient by default. Multiple options out here:

  1. Run with --disallow-untyped-calls:

    lol.py:6: error: Call to untyped function "_g" in typed context
    Found 1 error in 1 file (checked 1 source file)
  2. Run with --warn-return-any

    lol.py:6: error: Returning Any from function declared to return "int"
    Found 1 error in 1 file (checked 1 source file)

So if you want this strictness, it is only one flag away.


def return_4() -> int:
    l = [2, 4, 6]
    assert 4 in l   # basically something that you *know* is True for at least one element in l

    def cond(x):
        return x == 4

    for x in l:
        if cond(x):
            return x

This example looks quite non-sensical to me. Could you link the original code so I can have a look at the "real" example?

In general, if you hit such an obscure case this either warrants a refactor of the current implementation or simply a # type: ignore[foo] comment.


Speaking of #type: ignore comments, I'm not sure why you basically insist on mypy being perfect. If the implementation is logically correct and you are happy with it but mypy is not, just silence it. I mean for other static checkers such as flake8 we also flat-out ignore some error codes and this doesn't seem to bother you.

https://github.com/pytorch/vision/blob/8e2bd0e070dbac1011134b0a71e88313a406c2b6/setup.cfg#L12

Plus git grep '# noqa' | wc -l reveals 14 instances where we silence flake8 locally.

NicolasHug commented 3 years ago

The reason why I share my opinion here is because I want to engage with all the contributors and solicit their feedback. The decision of stop adding typing annotations in favour of docstrings is quite major and can't be made without quorum. This is especially true when the matter boils down to code readability and aesthetics where there is a certain degree of subjectivity.

And I appreciate you sharing your thoughts, as I'm the one that started this discussion and invited people to comment. It's not just about readability and aesthetics though: I hope I properly described above how the ongoing work is a) not as useful as we originally thought it would be, b) hard work, c) will raise the barrier to entry for the less experienced contributors. If you disagree with these points I'm happy to discuss further, but at the beginning of your comment in https://github.com/pytorch/vision/issues/2025#issuecomment-899618956 it looked to me like you were simply re-stating what you had said earlier, without taking my replies into account. Apologies if I'm mistaken here.

So if you want this strictness, it is only one flag away.

Well, no. We can't use the flags you mentioned until everything is typed, which brings us back to my original point: we have no ETA w.r.t. torchscript / mypy compatibility, so we don't know when we'll be able to type everything properly.

This example looks quite non-sensical to me. Could you link the original code so I can have a look at the "real" example?

I agree the example may look strange, but it happens in real code: https://github.com/pandas-dev/pandas/pull/29944#discussion_r647372296

Speaking of #type: ignore comments, I'm not sure why you basically insist on mypy being perfect. If the implementation is logically correct and you are happy with it but mypy is not, just silence it

I'm sorry, it's not as simple. It sometimes take quite a while to decipher mypy's error message, and to realize that mypy is actually wrong. And for inexperienced contributors, it can take time to properly silence mypy as well. A recent example is @vmoens who struggled to find a solution for https://github.com/pytorch/vision/pull/4256 - look at what mypy is complaning about, and the amount of "amend" commit. The mypy error message makes little sense. We finally sorted it out after a bit of back and forth on the PR and on a private chat. Again: higher barrier to contribute.

pmeier commented 3 years ago

Well, no. We can't use the flags you mentioned until everything is typed, which brings us back to my original point: we have no ETA w.r.t. torchscript / mypy compatibility, so we don't know when we'll be able to type everything properly.

Not true. You can set these flags on a module level in the configuration file like we are currently doing with the ignores:

https://github.com/pytorch/vision/blob/8e2bd0e070dbac1011134b0a71e88313a406c2b6/mypy.ini#L7-L9

So if we go module by module in adding annotations that should not be an issue.

I agree the example may look strange, but it happens in real code: pandas-dev/pandas#29944 (comment)

From reading just the snippet, I would say it is non-obvious that self.subplots always contains col_idx. If you are sure that is the case at runtime, just ignore the error. This brings us to the last point:

t sometimes take quite a while to decipher mypy's error message, and to realize that mypy is actually wrong.

In the PR you linked, mypy is not wrong, we are using it wrong. It is a static type checker. So if you do shenanigans with dynamic attributes, I wouldn't expect it to be working. Neither nn.Conv2d nor nn.Linear have an stddev attribute and this is where the error stems from. The confusion comes from the fact that an nn.Module is setup to return Union[Tensor, nn.Module] as default type for unknown attributes:

from torch import nn

m = nn.Module()
reveal_type(m.foo)
main.py:4: note: Revealed type is 'Union[torch._tensor.Tensor, torch.nn.modules.module.Module]

A remedy for this, is to simply tell mypy "this dynamic attribute that I'm accessing is actually a float":

diff --git a/torchvision/models/inception.py b/torchvision/models/inception.py
index b5cefefa7..cb67aea08 100644
--- a/torchvision/models/inception.py
+++ b/torchvision/models/inception.py
@@ -4,7 +4,7 @@ import torch
 from torch import nn, Tensor
 import torch.nn.functional as F
 from .._internally_replaced_utils import load_state_dict_from_url
-from typing import Callable, Any, Optional, Tuple, List
+from typing import Callable, Any, Optional, Tuple, List, cast

 __all__ = ['Inception3', 'inception_v3', 'InceptionOutputs', '_InceptionOutputs']
@@ -120,7 +120,7 @@ class Inception3(nn.Module):
         if init_weights:
             for m in self.modules():
                 if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
-                    stddev = float(m.stddev) if hasattr(m, 'stddev') else 0.1  # type: ignore
+                    stddev = cast(float, m.stddev) if hasattr(m, 'stddev') else 0.1
                     torch.nn.init.trunc_normal_(m.weight, mean=0.0, std=stddev, a=-2, b=2)
                 elif isinstance(m, nn.BatchNorm2d):
                     nn.init.constant_(m.weight, 1)

So yes, mypy increases the burden as an inexperienced contributor. But then, given its complexity, PyTorch is not really a beginner project. I'm not saying this discourage new contributors, but I think it is fair in such a case to say "This seems right to me, but I don't know how to make mypy happy. Can someone with more experience please tell me what is going on?".

datumbox commented 3 years ago

If you disagree with these points I'm happy to discuss further, but at the beginning of your comment in #2025 (comment) it looked to me like you were simply re-stating what you had said earlier, without taking my replies into account.

I don't think this remark is fair. Please when it doubt, assume positive intent and seek clarifications.

Coming to the discussion, my original position was that mypy is the defacto static analysis tool used for validating typing and that we should use it as much as possible (except in cases where it makes mistakes where we turn it off). This is pretty much the way we use it right now in TorchVision. By taking into account your remarks along with those of @pmeier and @oke-aditya, it became clear that sometimes it's very hard to please it and this could lead to convoluted workarounds (see #4237). Given that perhaps a more reasonable approach is to try to adjust, configure and use mypy with caution, as @pmeier advocates.

NicolasHug commented 3 years ago

Philip,

That we can set flags on a per-file basis does not change the fact that these annotations won't help users until mypy doesn't conflict with torchscript, thus (IMO) drastically reducing the benefits of these annotations.

Also, yes, there are technical explanations and technical solutions to all of our problems here. Thank you for taking the time and effort to clarify these. But just because we can logically explain something doesn't mean that this thing is easy, or natural, or expected for most contributors. The barrier to contribute is something that I care deeply about, and it's also something that our upper management cares about (it came up various times in internal meetings). We're all trying to foster external contributions in this very issue, so surely, we all care about it here. And indeed, contributors can ask for help, but from my own experience these mypy-related issues popup so frequently that this isn't just something we can ignore, or disregard, or just waive off as a minor hindrance.

Vasilis,

my apologies if my comment came out as unfair. Please rest assured that I am assuming positive intent and seeking clarifications. On my end, I believe I missed your position on a few points that I tried to make, which left me with the impression that my comments had been ignored. With that regard, in order for me to better understand where we all stand, would you mind clarifying your position on the following items:

Do you disagree with my POV that type annotations won't help users much until mypy doesn't conflict with torchscript?

From my understanding, you believe that annotations are still relevant info for us (the developers), and you prefer annotations to docstrings. Which leads me to these next questions: Do you you always find annotations to be better than docstrings, even though docstrings contain more info / descriptions about the parameters? Would you mind providing an example where a docstring is not as clear or not as convenient as a type annotation? Also, do you find the annotation of the colors parameter clearer than its English description? https://github.com/pytorch/vision/issues/2025#issuecomment-899632993

What is your position regarding mypy raising the barrier for contributors?

Thanks a lot for you patience and feedback.

datumbox commented 3 years ago

Do you disagree with my POV that type annotations won't help users much until mypy doesn't conflict with torchscript?

I believe that anyone who reads the code, either they are a user or a developer of the library, can benefit from seeing good typing information. Provided that the info is not wrong or misleading, they improve code readability and make the code more explicit. You did convince me though that in cases where mypy requires lots of workarounds, we should omit them.

you prefer annotations to docstrings

I'm not sure why docstrings are even considered an option as they are meant for documentation not for defining types (perhaps when Python did not support typing it was a workaround but not anymore). Python's effort of introducing typing is still in its infancy and the static analysis tools like mypy are not mature. Still I believe it's a step towards the right direction to make the language more precise and potentially faster. For me the typing annotations is the right tool to encode typing and using them will future-proof our code (covering the code-base will take time and it's not an all-or-nothing thing, will unlock potential speed benefits as the language evolves, will improve readability etc).

Also, do you find the annotation of the colors parameter clearer than its English description?

We can use type aliases if we want to reduce complexity. But I think this example hints that we got a very overloaded API and an implementation that does a lot of things.

What is your position regarding mypy raising the barrier for contributors?

I don't think it's going to be a problem. We are a friendly bunch and we can help getting this fixed during the code reviews. Moreover contributors can still submit code without annotations and we can take care of that either on a follow up (good trade-off to reduce the back-and-forth and frustration on new contributors). I would even say that adding typing info is a good "bootcamp" task for someone who wants to learn the code-base and get involved.

pmeier commented 3 years ago

@NicolasHug

That we can set flags on a per-file basis does not change the fact that these annotations won't help users until mypy doesn't conflict with torchscript, thus (IMO) drastically reducing the benefits of these annotations.

True, not debating that. Same for the legibility debate in the docstrings. There are definitively cases where the annotation is less legible than a short description. Unfortunately, this will usually be the case for heavily overloaded keyword arguments in convenience functions. These functions are probably used by a lot of users so good documentation is key here. We need to find a solution for that. An easy solution would be to let the annotations and the docstring diverge. We are not enforcing equality here.

My point is that in the examples you have shown the nuisance stems from the fact that either mypy is misconfigured or used in a dynamic context which it is not built for. In these cases it is up to us to configure it properly, refactor the code to remove the problematic parts, or simply silence it. mypy is usually (yes, sometimes it is actually a mypy issue) not to blame. In doing so, we are missing the point.

But just because we can logically explain something doesn't mean that this thing is easy, or natural, or expected for most contributors. The barrier to contribute is something that I care deeply about, and it's also something that our upper management cares about (it came up various times in internal meetings).

Note that you are switching your argument here from "we should hold adding type annotations until mypy and torchscript have converged" to "we should not add type annotations". Barrier of entry will only slightly change in the future and all the "weirdness" you gave examples for above will still be there.

And indeed, contributors can ask for help, but from my own experience these mypy-related issues popup so frequently that this isn't just something we can ignore, or disregard, or just waive off as a minor hindrance.

I don't think asking for help is an issue. This is true not only for the contributors, but also for us maintainers. If you see some "weird" mypy behavior feel free to reach out to me. There is no need for you or anyone else to battle this if they don't want to get into the guts of it.

NicolasHug commented 3 years ago

Thank you both for taking the time to reply. I will address a few last points below that I believe are important to clarify, but I think it's probably best to move on. I've made my case and I understand that I didn't convince you yet, so I probably won't be able to convince you with more discussion. On my side, I still fail to foresee the benefits of this current typing effort, but I'll be patient and hopefully the benefits will be appear clearer in the future, should you decide to move forward with it.

Thanks again for your input :)


I'm not sure why docstrings are even considered an option as they are meant for documentation not for defining types (perhaps when Python did not support typing it was a workaround but not anymore)

It seems that we're referring to different things here. I will definitely agree that docstrings aren't a substitute for annotations, and annotations aren't a substitute for docstrings either. They have different purposes (documentation vs type-checking) and a different audience (humans vs computers). My original point here, which was a reply to an argument of yours https://github.com/pytorch/vision/issues/2025#issuecomment-899618956, is that when it comes to code readability, annotations add no value over a docstring. Again: the point of an annotation is not to specify types for human readers, they specify types for the type-checker; types should be documented for humans as well, but this is the docstrings' job.

My point is that in the examples you have shown the nuisance stems from the fact that either mypy is misconfigured or used in a dynamic context which it is not built for. In these cases it is up to us to configure it properly, refactor the code to remove the problematic parts, or simply silence it. mypy is usually (yes, sometimes it is actually a mypy issue) not to blame. In doing so, we are missing the point.

I generally agree with your opinion, but I would like to add a bit of nuance. The first one is that the word "simply" in "simply silence it" looks out of place for me :). Getting to understand and silence mypy was never simple or obvious to me. The second is that, while I agree that we should ideally take the time to properly refactor the code when mypy flags some code smell, the reality is that this is usually not the mode that we operate in. For better or worse, we tend to get sh!t done and silence / merge fast, rather than fully address the underlying issues.

Note that you are switching your argument here from "we should hold adding type annotations until mypy and torchscript have converged" to "we should not add type annotations". Barrier of entry will only slightly change in the future and all the "weirdness" you gave examples for above will still be there.

To clarify: you're right that the barrier of entry will hardly change in the future. However, once mypy and torchscript are compatible, we'll be able to fully type-check torchvision. I can understand that fully type-checking torchvision is a strong enough reason to eventually raise the contribution bar. But until we can properly type check, I believe that annotations have a very limited value, and so I believe that it's not enough of a reason at the moment for raising the bar.

pmeier commented 3 years ago

The first one is that the word "simply" in "simply silence it" looks out of place for me :). Getting to understand and silence mypy was never simple or obvious to me.

Maybe we are finally getting to the source of your aversion. Let's take your example from above:

def return_4() -> int:
    l = [2, 4, 6]
    assert 4 in l   # basically something that you *know* is True for at least one element in l

    def cond(x):
        return x == 4

    for x in l:
        if cond(x):
            return x

Running mypy on this gives you:

main.py:1: error: Missing return statement  [return]
    def return_4() -> int:
    ^
Found 1 error in 1 file (checked 1 source file)

If you just want to silence mypy here, you only need two pieces of information from this:

  1. Where did this happen? main.py:1
  2. What is the error code? [return]

If you now place a # type: ignore[return] on line 1 in the file main.py, mypy is silenced.

def return_4() -> int:  # type: ignore[return]
    ...
Success: no issues found in 1 source file

IMHO, this is as simple as locally silencing other linters such as flake8.

The second is that, while I agree that we should ideally take the time to properly refactor the code when mypy flags some code smell, the reality is that this is usually not the mode that we operate in. For better or worse, we tend to get sh!t done and silence / merge fast, rather than fully address the underlying issues.

I can fully get behind that. On anything that is not user facing, we can just slap a # type: ignore on it and fix it later.

But until we can properly type check, I believe that annotations have a very limited value, and so I believe that it's not enough of a reason at the moment for raising the bar.

Again, I can get behind that. @oke-aditya and @frgfm have put in quite some effort into the PRs so I think it is fair to properly review them. After that I wouldn't push further until torchscript finally gets support for fundamental stuff. Of course, if the development of torchscript is halted we need to revisit this and see if it is worth to still add the annotations.

frgfm commented 3 years ago

Hey everyone :wave:

Just making sure you guys don't trouble yourself too much for @oke-aditya & me:

Again, you all have brought us an extremely useful framework and related resources, I'm far from being the only one willing to help :ok_hand: Happy to discuss this further, we all just want to continue making the PyTorch ecosystem as user/developer-friendly and useful as possible :grinning:

mauvilsa commented 2 years ago

Hello. I would like to add type annotations to file torchvision/transforms/transforms.py. The reason is not just for static type checking. The goal is to make torchvision.transforms.Compose work with pytorch-lightning's LightningCLI which makes classes automatically configurable by inspecting signatures at run time.

I am asking here because I saw the torch.jit types limitation, https://github.com/pytorch/vision/issues/2025#issuecomment-890510609. From a fast look at the code, the type for the transforms parameter of the Compose class seems to be List[Union[torch.nn.Module, Callable]]. Maybe I don't have the correct type yet, but not important. The actual question is, would List and Callable be a problem for torch.jit?

pmeier commented 2 years ago

The actual question is, would List and Callable be a problem for torch.jit?

List no, Callable yes. Unfortunately, torchscript only supports a very limited subset of types.

We are currently in the process of revamping the transforms. With this, we are likely to drop JIT scriptability for the transform classes (although there are options to retain it #6711). Thus, there is no reason to keep wrong or unnecessary strict annotations on these classes. See #5626 for a discussion.

frgfm commented 2 years ago

On this topic, should we add / update the TODO list in the issue description? Or open another perhaps? I can easily this GH issue sticking around forever otherwise :)

pmeier commented 2 years ago

@frgfm Yes, it would be a good idea to get a better summary of what needs to be done and what is already finished. Core team is swamped at the moment so we won't be making progress on that soon. If you are willing, could you post a comment like I did in https://github.com/pytorch/vision/issues/2025#issuecomment-606421677 (probably finer grained) and link open PRs. I freely admit that I have no idea what is still open and blocked by something (I vaguely remember there was a PR from you where the JIT simply didn't respond on core).

After that we can make a decision how to move forward. For datasets and transforms it is probably harder since they are still in prototype mode. We already have #5626 and #6668 that deal with annotations there so you might have a look at them first.

frgfm commented 2 years ago

Sure! Here is where we are at:

I'll update this message with future evolutions

oke-aditya commented 2 years ago

A bunch of PRs actually that kind of be useful https://github.com/pytorch/vision/pull/4630 https://github.com/pytorch/vision/pull/4612 https://github.com/pytorch/vision/pull/4599 https://github.com/pytorch/vision/pull/4323

scm-aiml commented 3 months ago

Hello all, I recently stumbled across this thread as I have been spending the past few days developing stubs for torchvision as PyTorch already had py.typed incorporated to support the code base. I spent a good deal of time reviewing this thread, the CONTRIB readme, and it appears that there is still value in moving forward with contributing to the type hinting while keeping in mind a few key points:

  1. Keep them smaller PR.s (per file ~)
  2. there is still compatibility issues with torchscript

I was going through the dev setup for Mac, but given some recent issues in #8421, I have a dev container for linux up and running.

One small question though: I notice quite a few times mypy throwing errors over import type of Any (which can be suppressed) but seems to be an issue with not have a proper annotation of -> None: on __init__ methods of class definitions. I have seen some examples of code here that do not have that, but many of the updates from @frgfm use the correct annotation as pointed out in pep-484. It's little but I just wanted to clarify before moving forward.

NicolasHug commented 3 months ago

Hi @scm-aiml ,

Thanks for working on this. We're not planning on adding any new type annotation at this point. Those that currently exist in torchvision are often wrong because they are incompatible with torchscript, so type-checking torchvision code has little value: it'll error for irrelevant reasons.

Since here's no plan on adding more annotations, I'll close this issue to avoid misleading contributors. Sorry about that @scm-aiml

NicolasHug commented 3 months ago

one last thing: if there's a community-supported stub package for torchvision that gets regular updates, I'd be more than happy to point users to it (without BC guarantees from torchvision's part though)

scm-aiml commented 3 months ago

Hi @NicolasHug,

I didn't intend to disregard your comments in the thread. My confusion just comes from seeing your comment in August about 3 years ago and having strong feelings opposing typing torchvision, but then there continued to be a large series of contribution(s) that were approved and brought in from PRs.

I guess my primary question would be are PRs not supported/received which make these contributions or are they just not going to be labeled for any priority as a "feature request"?

NicolasHug commented 3 months ago

No worries @scm-aiml

A lot has passed since the old discussion was started. I was never convinced that type annotations made much sense for torchvision, and I haven't changed my mind on that. The discussion above was never truly resolved in the sense that active maintainers at the time just... disagreed to disagree. So, with inertia, the status quo continued: PRs were submitted and those who were in favor of annotating torchvision reviewed and merged this PR. These days, I'm the only maintainer left in charge of torchvision. I have much less bandwidth to review stuff than when there were more of us,, and as a result I can only allocate time to high-priority issues.

Spritan commented 4 days ago

@scm-aiml are you interested in making your stubs for torchvision public as a community project?