python / cpython

The Python programming language
https://www.python.org
Other
63.49k stars 30.4k forks source link

`NamedTuple` can't inherit from another class #116241

Open Conchylicultor opened 8 months ago

Conchylicultor commented 8 months ago

Feature or enhancement

Proposal:

Currently this code is failing:

class A:
  pass

class B(A, typing.NamedTuple):
  x: int

Raising:

TypeError: can only inherit from a NamedTuple type and Generic

But this code is not:

class A:
  pass

class _B(typing.NamedTuple):
  x: int

class B(A, _B):
  pass

B(x=1)

I believe those 2 snippets are mostly equivalent (the final B class behave the same), so it's confusing and force boilerplate that class B(A, typing.NamedTuple): fail.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

AlexWaygood commented 8 months ago

We previously discussed this as something some users might want in https://github.com/python/cpython/issues/88089#issuecomment-1093912596. At that point in time, we decided not to allow multiple inheritance with arbitrary classes, just with typing.Generic, because:

If there's a need for using mixin classes with NamedTuple, I'm open to revisiting this restriction.

AlexWaygood commented 8 months ago

One immediate issue I see is what to do about __slots__. Currently all namedtuple classes have __slots__ set to the empty tuple, which means that instances of namedtuple classes use less memory than instances of classes that don't use __slots__. But setting __slots__ to the empty tuple is useless unless all base classes also have __slots__ set to the empty tuple. What if somebody tries to use a mixin class like A in @Conchylicultor's example, that doesn't set __slots__? Should we raise an exception, on the grounds that this namedtuple class won't come with the memory savings you might expect from using a namedtuple? Or should we assume the user knows what they're doing here, and let it pass?

serhiy-storchaka commented 8 months ago

31781 was proposed with implementation of this feature, but it did not receive good support and more limited version was accepted instead. See also a discussion in #88089.

Conchylicultor commented 8 months ago

Thanks for the answer, I understand.

I don't really have answer on the various points you raised. But I can provide more context on my use-case.

In our codebase, we're defining some protocol, to handle saving various object types (Dataset, Model,...):

class Checkpointable(abc.ABC):

  @abc.abstractmethod
  def __kd_restore__(self, x):
    ...

class Dataset(Checkpointable):
  ...

class State(Checkpointable):
  ...

def restore[T: Checkpointable](path: PathLike, obj: T) -> T:
  ...

To save multiple object at once, I was trying to add some wrapper:

class TopLevel(typing.NamedTuple, Checkpointable):
  state: State
  ds: Dataset

The reason I was trying to use named tuple rather than dataclasses is because it allow a much more compact syntax in this specific case:

state, ds = restore('/tmp/', TopLevel(state, ds))

vs

out = restore('/tmp/', TopLevel(state, ds))
state = out.state
ds = out.ds

My actual use-case is more complicated with more indirections, variables. But that's the main idea

serhiy-storchaka commented 8 months ago

I updated #31781. It is just removing 4 lines of code which forbid multiple inheritance.

serhiy-storchaka commented 6 months ago

The currently working workaround is to use an immediate class:

class TopLevel(collections.namedtuple('TopLevel', ('state', 'ds')), Checkpointable):
  pass

or

class TopLevel(typing.NamedTuple):
  state: State
  ds: Dataset
class TopLevel(TopLevel, Checkpointable):
  pass

Is such complication necessary or we can omit an immediate step?

The issue with __slots__ is solved in the same way as in other cases of using __slots__ or named tuples: you should add __slots__ = () in all base classes if you want to get a benefit from __slots__ in child classes.

serhiy-storchaka commented 1 month ago

So, what should we do with this? There were at least two requests for this feature.

@gvanrossum, @JelleZijlstra, @AlexWaygood?

gvanrossum commented 1 month ago

If this works, why not? The implementation seems simple enough. :-) Could have some more tests for normal cases and other edge cases.

But I'd like the typing council to consider this, since all the type checkers also have to support this, and given (IIRC) the special-casing for NamedTuple, that may be more complicated. @erictraut @JukkaL

AlexWaygood commented 1 month ago

since all the type checkers also have to support this

Do they? Just because you can do something at runtime doesn't mean that type checkers must support it. There are many things that are allowed at runtime (including when it comes to typing-specific features) that type checkers disallow. To some extent, this is the point of a type checker.

I agree that I'd like the typing council to consider whether this is something type checkers should support. But I actually think whether or not static type checkers should support it is a separate question to whether we should allow it at runtime or not. Type checkers are not and should not feel obligated to support every behaviour of the typing module that works without an exception being raised.

gvanrossum commented 1 month ago

I feel that NamedTuple is special to type checkers -- and in fact it was originally introduced as something you could put in a .pyi file to create a properly typed collections.namedtuple. So I think it would be unfortunate if the same type suddenly acquired a feature that type checkers don't know about.

serhiy-storchaka commented 1 month ago

If they support indirect multiple inheritance with subclasses of NamedTuple (see https://github.com/python/cpython/issues/116241#issuecomment-2098210154), they should not have problems with supporting more direct multiple inheritance. Do they?

AlexWaygood commented 1 month ago

I feel that NamedTuple is special to type checkers -- and in fact it was originally introduced as something you could put in a .pyi file to create a properly typed collections.namedtuple.

I could make the same argument about PEP-585 subscriptions, which were a typing-specific feature introduced specifically for typing purposes, but allow many things at runtime that type checkers will complain about:

>>> dict[int, str, bool, bytes]
dict[int, str, bool, bytes]
>>> list[42]
list[42]

If they support indirect multiple inheritance with subclasses of NamedTuple (see #116241 (comment)), they should not have problems with supporting more direct multiple inheritance. Do they?

Both mypy and pyright accept indirect multiple inheritance but reject direct multiple inheritance:

So yes, they would both require at least some modifications in order to support this. I don't know whether that's as simple as just removing a few lines of code that are there to make sure that they complain about the thing that fails at runtime, though, or whether it would require them to rewrite some fundamental assumptions about how NamedTuples work.

erictraut commented 1 month ago

In the case of pyright, this change would be easy to support. Just a few lines of code to make the current check conditional on the python version. I'm guessing this is true of other type checkers as well.

Edit: This would require a small change to the typing spec, which currently states:

NamedTuple does not support multiple inheritance. Type checkers should enforce this limitation.

gvanrossum commented 1 month ago

I could make the same argument about PEP-585 subscriptions, which were a typing-specific feature introduced specifically for typing purposes, but allow many things at runtime that type checkers will complain about: [...]

That seems to be a wrong analogy (maybe just a strawman?). What you want to do with multiple inheritance here is useful at runtime; the examples you quoted are useless, and we don't promise that those incorrect usages won't become errors at runtime too (they're the result of an implementation shortcut).

But given other feedback I think this is fine, and I will approve the PR next.

AlexWaygood commented 1 month ago

What you want to do with multiple inheritance here is useful at runtime; the examples you quoted are useless, and we don't promise that those incorrect usages won't become errors at runtime too (they're the result of an implementation shortcut)

That's a fair point. (Though I'm quite confident there are other examples I could come up with using the typing-module internals of "useful" things that are allowed at runtime but disallowed by the typing spec and type checkers :-)

gvanrossum commented 1 month ago

I cringe every time I see code that uses undocumented parts of typing.py — they are risking breakage but also constraining what we can do.

AlexWaygood commented 1 month ago

I cringe every time I see code that uses undocumented parts of typing.py — they are risking breakage but also constraining what we can do.

So do I; I fully agree. That wasn't what I meant (I should have been clearer; apologies). I was thinking more about the fact, for example, that ClassVars subscripted by type variables are allowed at runtime despite the spec explicitly prohbiting them; or the fact that Literal[int] has no meaning to type checkers currently (though it could plausibly be used to mean "exactly the int class, and not a subclass"), but is allowed. These and other things are things which are allowed at runtime, which could plausibly be useful, and which users might reasonably expect type checkers to support (but which they don't).

>>> from typing import *
>>> T = TypeVar("T")
>>> class Foo(Generic[T]):
...     x: ClassVar[T]
...     
>>> Foo.__annotations__
{'x': typing.ClassVar[~T]}
>>> Literal[str]
typing.Literal[str]
>>> Literal[int]
typing.Literal[int]

Anyway, this is a pretty minor point. I agree that none of these are an exact analogy to NamedTuple, where the runtime is already strict and we are considering loosening it.

JukkaL commented 1 month ago

I don't see this as something that is important to change, though the implementation in type checkers is probably not very hard. My reasoning is that it adds ambiguity about inherited attributes (would they be settable?) and I'm a little worried that there may be additional edge cases that we haven't thought of yet.

Basically I see this as only slightly useful (it hasn't been requested much even though NamedTuple was added years ago), but it makes things more complicated, with a small risk of some tricky or confusing edge cases. If there are some unclear edge cases, we'd probably need to document them as well, and write conformance tests, etc. Then every new user who learns about NamedTuple may have to deal with all these, at least when reading the documentation. I prefer to keep it simple unless there is a compelling reason not to.

CoolCat467 commented 1 month ago

I feel that this is important to change. There are only four lines of code stopping multiple base inheritance at runtime, and while I understand removing the current limitation could make type checkers need to do a bit more work, I think that's better than leading people like myself to having to make a copy implementation of NamedTupleMeta without those four lines or jump through other hoops to achieve the desired result. It encourages bad practices if someone is determined. I understand from reading here that the limitation was imposed at the time NamedTuple was introduced to adhere to exactly what the specs say, but that's just because no one thought of a reason to allow that at the time. Allowing multiple base classes means if you want some sort of shared functionality between multiple objects, that's now possible, just like other built-in python objects. It feels a bit arbitrary that NamedTuple doesn't share this feature.

JukkaL commented 1 month ago

Even if a typing-related change looks small, the effort quickly adds up. There are at least 6 type checkers which could want to implement support for it, since they probably try to follow standards. A small change might take 4 hours to implement (implementation, writing tests, documentation, code review) per type checker. So it could take 24 hours of effort overall, as a very rough approximation. If there are tricky edge cases, it may be much more that this.

gvanrossum commented 1 month ago

Couldn’t we try to phase out NamedTuple in favor of data classes with slots? I still think that NamedTuple should only exist as a quick typed version of collections.namedtuple, not as a feature in itself.