python / cpython

The Python programming language
https://www.python.org/
Other
60.95k stars 29.42k forks source link

`__module__` is not defined, seeming to contradict the Python Data Model. #120857

Open mike-matera opened 2 weeks ago

mike-matera commented 2 weeks ago

Bug report

Bug description:

The __module__ variable might not be defined.

This is an update to the original bug post. After some investigation this doesn't seem like a bug in unittest. Here's a simple test case that shows a situation where __module__ is not defined:

test_str = """
class MyClass:
    pass

Derived = type("Derived", (MyClass,), {})
"""

test_ns = {}
exec(test_str, test_ns)
print("__module__ exists:", hasattr(test_ns["Derived"], "__module__"))
print("Module for drived:", test_ns["Derived"].__module__)

The original bug post is below:

https://github.com/python/cpython/blob/03fa2df92707b543c304a426732214002f81d671/Lib/unittest/loader.py#L220

The default test loader depends on __module__ being present. In rare cases it may not be. I encountered a crash when running unit tests using a Jupyter cell magic. The line should probably be:

            fullName = f'%s.%s.%s' % (
                testCaseClass.__module__ if hasattr(testCaseClass, "__module__") else "None", testCaseClass.__qualname__, attrname
            )

This problem has cropped up in bug reports for other projects. For example:

https://github.com/ray-project/ray/issues/4758 https://gitlab.orekit.org/orekit-labs/python-wrapper/-/issues/411

Thank you for the wonderful work!

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Linked PRs

terryjreedy commented 2 weeks ago

Have you verified that the test works as expected with the suggested fix?

The Data model section, seems to me to suggest that __module__ should always be present for class, function, or method. The entries for the latter two explicitly say or None if unavailable. In other words, if there is no module name is some esoteric setting, assign None. This is similar to setting __doc__ to None when there is no docstring. I have never used Jupiter, but it should be able to fix this.

mike-matera commented 2 weeks ago

Hi @terryjreedy.

Thanks for looking at this. I was able to reproduce the problem outside of Jupyter:

import unittest

test_str = """
import unittest

class MyTest(unittest.TestCase):

    def test_add(self):
        self.assertEqual(1+2, 3)

derived = type("DerivedTest", (MyTest,), {})
"""

test_ns = {}
exec(test_str, test_ns)
suite = unittest.TestSuite()
suite.addTest(unittest.defaultTestLoader.loadTestsFromTestCase(test_ns["derived"]))
runner = unittest.TextTestRunner()
runner.run(suite)

I have not tested my fix. Sorry. I'll work on that but, in the meantime, I hope this example helps!

mike-matera commented 2 weeks ago

Here's a simpler test case. You don't need unittest at all to see this:

test_str = """
class MyClass:
    pass

Derived = type("Derived", (MyClass,), {})
"""

test_ns = {}
exec(test_str, test_ns)
print("__module__ exists:", hasattr(test_ns["Derived"], "__module__"))
print("Module for drived:", test_ns["Derived"].__module__)

I agree with what you're saying about the Data Model section of the document. It seems like __module__ should always be defined or None. Maybe this isn't a bug with unittest at all. I tested my suggested fix but it doesn't work because there are other references to __module__ in the unittest framework that failed.

picnixz commented 2 weeks ago

Maybe it could be useful but test_ns['MyClass'].__module__ is set to 'builtins'. But I'm not sure it should be set as such because we are exec'ing in the __main__ module IMO.

However, when you do something like:

m = types.ModuleType('foo')
exec(test_str, m.__dict__)

then, MyClass.__module__ and Derived.__module__ are set to 'foo'.

picnixz commented 2 weeks ago

Mmh.. the fix I'm trying to do is a bit tricky. Actually, we have

>>> blts = builtins.__dict__.copy()
>>> del blts['__name__']
>>> exec("class A: pass\nB=type('B', (A,), {})", {'__builtins__': blts})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
  File "<string>", line 1, in A
NameError: name '__name__' is not defined

@JelleZijlstra What should we do? normally, you can specify your own builtins but if you do so, you'll need the __name__ as well. However, you would still have something like:

>>> exec("class A: pass\nB=type('B', (A,), {})", ns := {}); ns['B'].__module__
...
AttributeError: __module__. Did you mean: '__reduce__'?

and this indeed contradicts the data model.

JelleZijlstra commented 2 weeks ago

Here's a smaller reproducer:

>>> ns = {}
>>> exec("X = type('X', (), {})", ns)
>>> ns["X"].__module__
Traceback (most recent call last):
  File "<python-input-8>", line 1, in <module>
    ns["X"].__module__
AttributeError: __module__. Did you mean: '__reduce__'?

Or similarly:

>>> del __name__
>>> X = type('X', (), {})
>>> X.__module__
Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    X.__module__
AttributeError: __module__. Did you mean: '__reduce__'?

It is also possible to set __module__ to a non-str:

>>> class X: pass
... 
>>> X.__module__ = 42
>>> X.__module__
42

Not sure what to do about this; some options:

picnixz commented 2 weeks ago

Yes... I had thought of those options and then I remembered that I needed to take into account the change in Sphinx, which I was not really happy about.

In fact, we also have this behaviour:

>>> del __name__
>>> class A: pass
...
>>> A.__module__
'builtins'

So... does it make sense for type to behave similarly?

JelleZijlstra commented 2 weeks ago

The reason for that is that inside the class body we inject code that is essentially __module__ = __name__, where __name__ is looked up like a regular global variable. If it's not in the globals, we fall back to the builtins, and builtins.__name__ is builtins.

That behavior feels buggy too, though, and I'd rather not expand it further. Your class A isn't in the builtins module.

picnixz commented 2 weeks ago

Your class A isn't in the builtins module

I think we also had quite a lot of if-branches in Sphinx for this kind of "fake builtins that are not builtins". I've tried a PR where I inject the __name__ to the globals of exec but I kept the weird 'builtins' choice as well. Ideally, I would have liked to be able to set the module's name to something like ? or None but I felt this could break many things (well the PR still broke things so I don't think it's the way to go).

Maybe having __module__ = None is the best solution. It reflects what the module's name is actually and is closer to a function's module behaviour.

blhsing commented 2 weeks ago

The behavior of the module name defaulting to builtins being strange notwithstanding, I think there should at least be code to ensure that the __module__ attribute is set for a class, like there is for function and method. I've submitted a PR to bring the behavior of __module__ for class to parity.

picnixz commented 2 weeks ago

Well... I would appreciated we first discuss more in details the directions to take before opening another PR since I've already started working on the issue. While setting __module__ to None is the best direction I can think of, I'm not sure whether it's better to break codes that rely on cls.__module__ being always a string or code that rely on cls.__module__ existing in sys.modules.

(Also, in the future, I think it'd be better not to include the issue inside the commits because the issue noise becomes quite large)

blhsing commented 2 weeks ago

Well... I would appreciated we first discuss more in details the directions to take before opening another PR since I've already started working on the issue.

Sorry, I actually made my first commit 5 hours before yours but didn't push until quite a bit later due to other work, only to realize after submitting my PR that you had also submitted a PR. But then I thought that my solution may co-exist with yours since they fix different parts of the logics.

While setting __module__ to None is the best direction I can think of, I'm not sure whether it's better to break codes that rely on cls.__module__ being always a string or code that rely on cls.__module__ existing in sys.modules.

Good point. Your solution is indeed the safer and more practical route. A quick code search does reveal a lot of existing code that rely on the existing behavior of cls.__module__ being possibly missing. I've converted my PR to a draft then.

(Also, in the future, I think it'd be better not to include the issue inside the commits because the issue noise becomes quite large)

Noted. Thanks for the tips.

picnixz commented 2 weeks ago

I actually made my first commit 5 hours before yours but didn't push until quite a bit later due to other work

No worries! I don't mind when people suggest alternatives but it's better if we don't have two PRs that may do (in the end) the same work.

But then I thought that my solution may co-exist with yours since they fix different parts of the logics.

Actually, when I first implemented the patch on my side, I thought about just putting to None but then I was "mmmh, it's kinda breaking the fact that we may rely on strings perhaps?" so I decided not to. I also thought about a special string but I was like "meeeh this may be also a valid fake module name so...". In the end, I decided to extend the current behaviour to avoid breaking things...