python / typing

Python static typing home. Hosts the documentation and a user help forum.
https://typing.readthedocs.io/
Other
1.57k stars 230 forks source link

Assuming `self` or `cls` parameter is positional-only #1355

Open erictraut opened 1 year ago

erictraut commented 1 year ago

I'm interested in guidance from the typing community relating to a question that came up in a recent pyright bug report.

The issue has to do with the self parameter in an instance method and the cls parameter in a class method — whether a type checker should assume that they are implicitly positional-only parameters. In particular, the discussion was about method declarations within a protocol class.

PEP 570 makes a reference to this topic in its introductory section:

One special case of this situation is the self parameter for class methods: it is undesirable that a caller can bind by keyword to the name self when calling the method from the class.

But PEP 570 doesn't provide any specific guidance about self or cls in its specification section. PEP 544 (which introduced protocols) is also silent on the topic.

Pyright currently assumes that self and cls are not positional-only parameters (unless explicitly followed by a / parameter marker or named with double underscores). This assumption affects protocol matching — and more generally, subtype compatibility evaluation for callables.

@AlexWaygood is proposing that the first parameter of instance and class methods should always be considered position-only parameters.

This raises some questions in my mind:

  1. Is this a safe assumption? Does anyone know of a situation where self or cls are used as keyword arguments?
  2. Does this apply only to protocol methods, or does it apply generally to methods in any class?
  3. Should it be considered a type checker error if someone attempts to use a keyword argument to target the first parameter of an instance or class method? It works at runtime.
    
    class Foo:
    def bar(self): pass

Foo.bar(self=Foo())


5. Does the `cls` exemption apply only to explicit class methods, or does it also apply to `__new__` and `__init_subclass__`?
AlexWaygood commented 1 year ago

Thanks @erictraut for considering this issue, and for opening this discussion. For the context of others reading this thread, this originally came up in relation to the following code snippet:

from typing import Iterator

class Meta(type):
    def __iter__(cls) -> Iterator[int]:
        yield from range(10)

class Foo(metaclass=Meta):
    pass

iter(Foo) # succeeds at runtime, but pyright emits an error here

Pyright emits an error on this snippet because it does not consider Foo to conform to the Iterable protocol. This is because the Iterable protocol has a def __iter__(self) -> Iterator: ... method, and the first parameter of Meta.__iter__ in this snippet is named cls, not self. The discrepancy in the parameter names means that pyright considers that Foo is not iterable; if the first parameter of Meta.__iter__ is renamed to self, pyright no longer emits the error.

To give my perspective on your first two questions:

  1. Is this a safe assumption [to consider the first parameter as being implicitly positional-only]? Does anyone know of a situation where self or cls are used as keyword arguments?

    I've never seen self or cls being used as keyword arguments, and I think it's extremely rare (if it happens at all) for it to be done. If I saw it in code review, I would probably flag it and ask for it to be changed.

  2. Does this apply only to protocol methods, or does it apply generally to methods in any class?

    I don't have a strong opinion on this question. It doesn't make any sense to me to take into account whether the names of the first parameter match when you're considering whether a class conforms to a protocol; in my opinion, the class in the snippet I pasted above should clearly be considered iterable by a type checker. I wouldn't mind if this logic were applied more broadly, as I think it would break very few use cases if a type checker considered the first parameter of a (non-static) method as being implicitly positional-only. Equally, however, I don't have a significant use case to put forward for applying the logic more broadly. As you point out in question (3), it would disallow some valid calls that work at runtime; but on the other hand, that seems like pretty unusual code.

  3. Should it be considered a type checker error if someone attempts to use a keyword argument to target the first parameter of an instance or class method? It works at runtime.

    Same answer as to question (2)

  4. Does the cls exemption apply only to explicit class methods, or does it also apply to new and __init_subclass__?

    No strong opinion, but I would lean towards treating __new__ and __init_subclass__ similar to explicit class methods.

hmc-cs-mdrissi commented 1 year ago

On subtype compatibility issue, tensorflow has a fair amount of methods that use decorator that does not preserve name of the first (self) argument. Applying subtype compatibility rule to self/cls would mean those are all violations but I've never seen that actually be an issue and it seems like undocumented internal detail that they happen to have different. Concrete,

class Foo:
  def build(self, shape):
    ...

def convert_shapes(f):
  def _build(instance, shape):
    ...

class Bar(Foo):
  @convert_shapes
  def build(self, shape):
    ...

If you enforce name requirement on self then Bar should have type error error as convert_shapes loses name and ends up as instance. In practice I hadn't noticed this until writing the stubs and stubtest alerting me to this deviation.

So small preference that subtype compatibility also treat self/cls as positional only.

NeilGirdhar commented 1 year ago

What's the benefit of treating the first parameter of a non-static method as implicitly positional-only? Is it just to save some work in updating the typeshed?

I feel like these kinds of special cases for convenience end up being problematic later on. For example, *args: X implicitly meaning *args: tuple[X, ...] was probably one such error made for convenience.

hmc-cs-mdrissi commented 1 year ago

I think main benefit is it's footgun for most stubs and most implementations. For typed codebases I've very rarely seen a codebase mark self/cls as positional only, but I don't think most libraries would accept a bug report of we renamed self parameter (probably indirectly through decorator) and that broke backwards compatibility.

I view type signature as part of documentation/contract. Would you expect calling Foo.build(self=obj) to be something a library would document/promise as part of public api? For same example I gave above, documentation generated for tensorflow api's documents self for several methods even though real name is something else as they don't handle that edge case in there documentation tooling. I suspect a fair number of these names have changed from one minor version to next with release notes never needing to mention it. Similar to how I prefer to avoid documenting private methods I view self/cls aspect to be internal implementation.

There is related question here of should stubtest be strict about these names? If they are positional only then exact name should be allowed to vary with preference for either documentation/runtime still.

edit: As for *args example, I would expect beginner in typing to know what a tuple is and it's easy to teach (footgun there is other way). Should a beginner in typing know about positional only syntax? Should it be recommended practice to type hint protocols/classes with positional only everywhere for self? It'd mainly feel like noise to write everywhere.

NeilGirdhar commented 1 year ago

I think main benefit is it's footgun for most stubs and most implementations.

Right, but it's a footgun that's caught by type checkers. That's exactly how this was discovered 😄

Would you expect calling Foo.build(self=obj) to be something a library would document/promise as part of public api?

They are promising it when they don't mark it as positional.

Similar to how I prefer to avoid documenting private methods I view self/cls aspect to be internal implementation.

I guess I don't agree with this. That parameter is part of the public interface.

edit: As for *args example, I would expect beginner in typing to know what a tuple is and it's easy to teach

Not sure if I should have been more clear, but the args example I was alluding to is how we annotate *args in functions. I think it's pretty common to see

def f(*args: X): ...

as a Python typing design error where it would ideally have been the more flexible:

def f(*kargs: tuple[X, ...]): ...

This seems similar in the sense that it's a convenient implicit notation.

Should it be recommended practice to type hint protocols/classes with positional only everywhere for self?

My feeling is that if you mean for it to be an internal name, then yes. But I don't think that it's so common for people to want to insist that it should be internal. Why not offer users the flexibility to pass whatever parameter they want as keywords unless you have a good reason to block it?

TeamSpen210 commented 1 year ago

In regards to 4, what CPython currently does for __init_subclass__ and __class_getitem__ is literally wrap them in classmethod if they're function objects, so treating them like that would be exactly consistent with runtime semantics. For __new__ it uses staticmethod() instead. The class is then explicitly passed as a positional argument.

AlexWaygood commented 1 year ago

Note that, in contrast to self for instance methods, it does appear to be genuinely impossible to pass cls as a keyword argument to class methods:

>>> class Foo:
...   @classmethod
...   def bar(cls): pass
...   
>>> Foo.bar(cls=Foo)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: bar() got multiple values for argument 'cls'
>>> Foo.__dict__['bar'](cls=Foo)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: 'classmethod' object is not callable
>>> Foo.__dict__['bar'].__get__(instance=Foo)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: wrapper __get__ doesn't take keyword arguments
>>> Foo.__dict__['bar'].__get__(Foo)(cls=Foo)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: bar() got multiple values for argument 'cls'
erictraut commented 1 year ago

Thanks for the input. I've decided to always treat self and cls (for instance and class methods, respective) as positional-only for purposes of protocol matching. I'm not going to enforce this more generally because it doesn't match the behavior of the runtime, at least in the case of instance methods.