python / cpython

The Python programming language
https://www.python.org
Other
62.46k stars 29.98k forks source link

Incorrect __module__ attribute for _struct.Struct and perhaps a few others #79512

Closed df79943f-4aee-4531-a00d-c6b12816eb70 closed 2 years ago

df79943f-4aee-4531-a00d-c6b12816eb70 commented 5 years ago
BPO 35331
Nosy @mdickinson, @meadori, @serhiy-storchaka, @mr-nfamous

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', 'type-bug', '3.7'] title = 'Incorrect __module__ attribute for _struct.Struct and perhaps a few others' updated_at = user = 'https://github.com/mr-nfamous' ``` bugs.python.org fields: ```python activity = actor = 'bup' assignee = 'none' closed = False closed_date = None closer = None components = [] creation = creator = 'bup' dependencies = [] files = [] hgrepos = [] issue_num = 35331 keywords = [] message_count = 1.0 messages = ['330544'] nosy_count = 4.0 nosy_names = ['mark.dickinson', 'meador.inge', 'serhiy.storchaka', 'bup'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue35331' versions = ['Python 3.7', 'Python 3.8'] ```

df79943f-4aee-4531-a00d-c6b12816eb70 commented 5 years ago

_struct.Struct not defining a valid __module__ by prefixing its tp_name slot with "_struct" is inconsistent with every other extension type which is available in the corresponding module globals.

From the documentation of the tp_name slot:

Pointer to a NUL-terminated string containing the name of the type. For types that are accessible as module globals, the string should be the full module name, followed by a dot, followed by the type name; for built-in types, it should be just the type name. If the module is a submodule of a package, the full package name is part of the full module name. For example, a type named T defined in module M in subpackage Q in package P should have the tp_name initializer "P.Q.M.T".

For dynamically allocated type objects, this should just be the type name, and the module name explicitly stored in the type dict as the value for key '__module__'.

----

I know that this is also a way to make something unpickleable, but that seems like a poor way to do it and since _struct.Struct was relatively alone in this, I figured it was an oversight.

At the end is the script I made to display all currently alive "builtins" classes that have been "PyType_Ready"ed. For brevity I further manually filtered out obvious cases where a specified module would be inappropriate.

The main point is that I think the new contextvars classes, _struct.Struct, and the weakref classes are missing the "_struct", "_contextvars", and "_weakref" prefixes in their tp_name slots, respectively. Since _contextvars is one of the few extension modules using the multiphase initialization protocol, maybe it should go in their type dicts (although the prefix method still works) instead, although i think the docs were referring to heap allocated types.

if __name__=='__main__':
    import sys, collections
    subclassesof = type.__subclasses__
    def get_types(*names):
        r = {"__builtins__":{'__import__':__import__, 'globals':globals}}
        for name in names:
            exec(f'from {name} import __dict__ as d; globals().update(d)', r)
        return dict.fromkeys(r[k] for k in r if isinstance(r[k],type)).keys()
    def derivative_classes(cls):
        a = b = r = {*subclassesof(cls)}
        while b:
            r, a, b, = r|b, b, set().union(*map(subclassesof, b))
        return r | a    
    classes = derivative_classes(object)
    singles = None, NotImplemented, ...
    od = collections.OrderedDict()
    odtypes = iter(od), od.keys(), od.items(), od.values()
    bltns = {cls for cls in classes if cls.__module__=='builtins'}
    bltns-= get_types('builtins', 'types', '_collections_abc')
    bltns-= {*map(type, odtypes)} | {*map(type, singles)}
    for cls in sorted(bltns, key=vars(type)['__name__'].__get__):
        print(f'# {sys.getrefcount(cls):4} {cls.__name__}')

# all of these are in _contextvars.__dict but have their __module=='builtins': # 25 Context # 15 ContextVar # 12 Token

# from _struct # 23 Struct # IS in _struct.__dict__ # 11 unpack_iterator # no tp_new so make sense to leave as-is

# These are here because it's a mystery how they were included in the results # without importing _testcapi: # 25 hamt # 8 hamt_array_node # 8 hamt_bitmap_node # 8 hamt_collision_node

# no idea what these are: # 11 items # 11 values # 11 keys

# these are all in _weakref.__dict__ # 76 weakcallableproxy # 76 weakproxy # 32 weakref

serhiy-storchaka commented 2 years ago

It was fixed for _struct in 4f384af067d05b16a554bfd976934fca9f87a1cf (bpo-38076) and for _contextvars in 988f1ec8d2643a0d00851903abcdae90d57ac0e6 (bpo-1635741).

hamt, items, etc are private types.

mscuthbert commented 2 years ago

Hi all -- Thanks for the fix. I wish though that this were done on a major version upgrade, because I did not think that Class.__name__ was considered private (at least it's documented in the data model) and it ended up making 3.10.6 a backwards incompatible version in some places where .name was being used instead of isinstance() checking (long old code from the time when isinstance() checking was very slow, but the code still worked)