Closed XuehaiPan closed 1 day ago
Is there any reason users would care about this? Is there any real world impact?
The current result seems reasonable to me. The __new__
method was created virtually so it is fair to report it as `'namedtuple_MyTuple'. It has been this way for a very long time and has not been a source of problems.
Is there any reason users would care about this? Is there any real world impact?
I'm trying to improve namedtuple support in PyTorch Dynamo, an ML compiler backend with a lightweight Python interpreter. Dynamo traces the Python frames and generates performat inlined functions via traced opcodes. The __module__
attribute is used to reconstruct the Python function during codegen phase.
The codegen tries to import the __name__
attribute during creating the __new__
function for namedtuple
def get_globals_source_and_value(self, name):
if "__name__" in self.f_globals:
module_name = self.f_globals["__name__"]
module_source = self.import_source(module_name)
if "torch_package" in module_name:
fglobals_value = torch.package.package_importer._package_imported_modules[module_name] # type: ignore[assignment]
else:
fglobals_value = importlib.import_module(module_name) # type: ignore[assignment]
fglobals_vt = VariableTracker.build(self, fglobals_value, module_source)
global_source = AttrSource(module_source, name)
For PyTorch is this a show stopper or is a workaround available?
How would MyTuple.__new__.__module__
returning __main__
actually be useful for reconstructing a __new__
method? That isn't clear to me from your post. That attribute doesn't affect functionality. Technically, a namedtuple doesn't have to live in a module at all. It can be execed dynamically, created and used within a function's locals, stored in a class definination, or stored in a data structure rather than a module's globals. In these circumstances, returning __main__
isn't really correct.
For PyTorch is this a show stopper or is a workaround available?
PyTorch Dynamo handles NamedTuple via explicit special handling rather than a normal user-defined class. There is not a stopper for namedtuple types for now. Also, Dynamo is not a full-featured interpreter. We incrementally improve the coverage of the code in the ML community. That is fine for it to raise unimplemented
for unhandled cases.
How would
MyTuple.__new__.__module__
returning__main__
actually be useful for reconstructing a__new__
method? That isn't clear to me from your post. That attribute doesn't affect functionality.
The problem only shows up when the user tries to subclass a namedtuple type. The subclass needs the __module__
attribute to reconstruct its MyTupleSubType.__new__
, which is MyTuple.__new__
. See the example at the end of this post, the MyTupleSubType.__new__
method is used to create the instance of MyTupleSubType
in polyfill instantiate_user_defined_class_object
.
Technically, a namedtuple doesn't have to live in a module at all. It can be execed dynamically, created and used within a function's locals, stored in a class definination, or stored in a data structure rather than a module's globals. In these circumstances, returning
__main__
isn't really correct.
The problem only shows up when the user tries to subclass a namedtuple type. Dynamo only supports classes defined in the global scope for now. Dynamo works fine with namedtuple types (locally or globally) that directly created by collections.namedtuple
or typing.NamedTuple
.
Let me give more context. cc @jansel, correct me if I am wrong.
In Dynamo, each specialized type has its own variable tracker type.
list object -> ListVariable
tuple object -> TupleVariable
namedtuple object -> NamedTupleVariable
...
generic types -> UserDefinedClassVaraible
instance of generic types -> UserDefinedVaraible
While creating an instance of a given type, there is a handcrafted dispatch path for namedtuple type here: https://github.com/pytorch/pytorch/blob/51b6126f5432b4cc446a0bb3882aa9d98124caa8/torch/_dynamo/variables/user_defined.py#L454-L485
# Python code: MyTuple(*values)
# Dynamo call stack:
UserDefinedClassVaraible(MyTuple).call_function(tx, values, {})
-> NamedTupleVariable(MyTuple, values)
This handcrafted path bypasses to call MyTuple.__new__
. So no errors are shown in the past.
For other types that do not have specialization, the route goes here: https://github.com/pytorch/pytorch/blob/51b6126f5432b4cc446a0bb3882aa9d98124caa8/torch/_dynamo/variables/user_defined.py#L587-L600
# Python code: SomeType(*args, **kwargs)
# Dynamo call stack:
UserDefinedClassVaraible(SomeType).call_function(tx, args, kwargs)
-> UserFunctionVariable(instantiate_user_defined_class_object).call_function(tx, [SomeType, *args], **kwargs)
-> UserDefinedVaraible(obj)
# where:
def instantiate_user_defined_class_object(cls, /, *args, **kwargs):
obj = cls.__new__(cls, *args, **kwargs)
# Only call __init__ if the object is an instance of the class
# Reference: https://github.com/python/cpython/blob/3.12/Objects/typeobject.c#L1670-L1673
if isinstance(obj, cls):
obj.__init__(*args, **kwargs)
return obj
For subclasses of namedtuple type, they can not go through the NamedTupleVariable
path. That results in Dynamo internal error in instantiate_user_defined_class_object
for "can not reconstruct MyTupleSubType.__new__
".
The CPython behavior here looks reasonable to me. I think we should find a workaround in PyTorch.
The CPython behavior here looks reasonable to me. I think we should find a workaround in PyTorch.
Thanks for the insight. I'll mark this as closed.
Bug report
Bug description:
The
__module__
attribute forNamedTupleType.__new__
method isnamedtuple_{typename}
. It does not exist. The__module__
attribute for methods should respect themodule
argument passed tocollections.namedtuple
.CPython versions tested on:
3.12
Operating systems tested on:
Linux
Linked PRs