python / cpython

The Python programming language
https://www.python.org
Other
63.72k stars 30.53k forks source link

Unexpected `__module__` attribute for `NamedTupleType.__new__` method: `namedtuple_{typename}` #127187

Closed XuehaiPan closed 1 day ago

XuehaiPan commented 6 days ago

Bug report

Bug description:

The __module__ attribute for NamedTupleType.__new__ method is namedtuple_{typename}. It does not exist. The __module__ attribute for methods should respect the module argument passed to collections.namedtuple.

In [1]: import collections

In [2]: MyTuple = collections.namedtuple('MyTuple', ['x', 'y', 'z'])

In [3]: MyTuple.__module__
Out[3]: '__main__'

In [4]: MyTuple.__new__.__module__
Out[4]: 'namedtuple_MyTuple'

In [5]: MyTuple._make.__module__
Out[5]: 'collections'

In [6]: import typing

In [7]: class MyAnotherTuple(typing.NamedTuple):
   ...:     a: int
   ...:     b: float
   ...:     

In [8]: MyAnotherTuple.__module__
Out[8]: '__main__'

In [9]: MyAnotherTuple.__new__.__module__
Out[9]: 'namedtuple_MyAnotherTuple'

In [10]: MyAnotherTuple._make.__module__
Out[10]: 'collections'

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Linked PRs

rhettinger commented 6 days ago

Is there any reason users would care about this? Is there any real world impact?

The current result seems reasonable to me. The __new__ method was created virtually so it is fair to report it as `'namedtuple_MyTuple'. It has been this way for a very long time and has not been a source of problems.

XuehaiPan commented 6 days ago

Is there any reason users would care about this? Is there any real world impact?

I'm trying to improve namedtuple support in PyTorch Dynamo, an ML compiler backend with a lightweight Python interpreter. Dynamo traces the Python frames and generates performat inlined functions via traced opcodes. The __module__ attribute is used to reconstruct the Python function during codegen phase.

The codegen tries to import the __name__ attribute during creating the __new__ function for namedtuple

https://github.com/pytorch/pytorch/blob/bae951030752f2d7f06c15af0c59e54026d851e9/torch/_dynamo/symbolic_convert.py#L3290-L3299

    def get_globals_source_and_value(self, name):
        if "__name__" in self.f_globals:
            module_name = self.f_globals["__name__"]
            module_source = self.import_source(module_name)
            if "torch_package" in module_name:
                fglobals_value = torch.package.package_importer._package_imported_modules[module_name]  # type: ignore[assignment]
            else:
                fglobals_value = importlib.import_module(module_name)  # type: ignore[assignment]
            fglobals_vt = VariableTracker.build(self, fglobals_value, module_source)
            global_source = AttrSource(module_source, name)
rhettinger commented 5 days ago

For PyTorch is this a show stopper or is a workaround available?

How would MyTuple.__new__.__module__ returning __main__ actually be useful for reconstructing a __new__ method? That isn't clear to me from your post. That attribute doesn't affect functionality. Technically, a namedtuple doesn't have to live in a module at all. It can be execed dynamically, created and used within a function's locals, stored in a class definination, or stored in a data structure rather than a module's globals. In these circumstances, returning __main__ isn't really correct.

XuehaiPan commented 5 days ago

For PyTorch is this a show stopper or is a workaround available?

PyTorch Dynamo handles NamedTuple via explicit special handling rather than a normal user-defined class. There is not a stopper for namedtuple types for now. Also, Dynamo is not a full-featured interpreter. We incrementally improve the coverage of the code in the ML community. That is fine for it to raise unimplemented for unhandled cases.


How would MyTuple.__new__.__module__ returning __main__ actually be useful for reconstructing a __new__ method? That isn't clear to me from your post. That attribute doesn't affect functionality.

The problem only shows up when the user tries to subclass a namedtuple type. The subclass needs the __module__ attribute to reconstruct its MyTupleSubType.__new__, which is MyTuple.__new__. See the example at the end of this post, the MyTupleSubType.__new__ method is used to create the instance of MyTupleSubType in polyfill instantiate_user_defined_class_object.


Technically, a namedtuple doesn't have to live in a module at all. It can be execed dynamically, created and used within a function's locals, stored in a class definination, or stored in a data structure rather than a module's globals. In these circumstances, returning __main__ isn't really correct.

The problem only shows up when the user tries to subclass a namedtuple type. Dynamo only supports classes defined in the global scope for now. Dynamo works fine with namedtuple types (locally or globally) that directly created by collections.namedtuple or typing.NamedTuple.


Let me give more context. cc @jansel, correct me if I am wrong.

In Dynamo, each specialized type has its own variable tracker type.

list object               -> ListVariable
tuple object              -> TupleVariable
namedtuple object         -> NamedTupleVariable
...

generic types             -> UserDefinedClassVaraible
instance of generic types -> UserDefinedVaraible

While creating an instance of a given type, there is a handcrafted dispatch path for namedtuple type here: https://github.com/pytorch/pytorch/blob/51b6126f5432b4cc446a0bb3882aa9d98124caa8/torch/_dynamo/variables/user_defined.py#L454-L485

# Python code: MyTuple(*values)
# Dynamo call stack:
   UserDefinedClassVaraible(MyTuple).call_function(tx, values, {})
-> NamedTupleVariable(MyTuple, values)

This handcrafted path bypasses to call MyTuple.__new__. So no errors are shown in the past.

For other types that do not have specialization, the route goes here: https://github.com/pytorch/pytorch/blob/51b6126f5432b4cc446a0bb3882aa9d98124caa8/torch/_dynamo/variables/user_defined.py#L587-L600

# Python code: SomeType(*args, **kwargs)
# Dynamo call stack:
   UserDefinedClassVaraible(SomeType).call_function(tx, args, kwargs)
-> UserFunctionVariable(instantiate_user_defined_class_object).call_function(tx, [SomeType, *args], **kwargs)
-> UserDefinedVaraible(obj)

# where:
def instantiate_user_defined_class_object(cls, /, *args, **kwargs):
    obj = cls.__new__(cls, *args, **kwargs)

    # Only call __init__ if the object is an instance of the class
    # Reference: https://github.com/python/cpython/blob/3.12/Objects/typeobject.c#L1670-L1673
    if isinstance(obj, cls):
        obj.__init__(*args, **kwargs)
    return obj

For subclasses of namedtuple type, they can not go through the NamedTupleVariable path. That results in Dynamo internal error in instantiate_user_defined_class_object for "can not reconstruct MyTupleSubType.__new__".

jansel commented 4 days ago

The CPython behavior here looks reasonable to me. I think we should find a workaround in PyTorch.

rhettinger commented 1 day ago

The CPython behavior here looks reasonable to me. I think we should find a workaround in PyTorch.

Thanks for the insight. I'll mark this as closed.