Types that don't exist under their inspected name

o11c commented 9 years ago

Some classes don't exist under their inspected name. I define inspected name as __module__.__qualname__, but only if __qualname__ ends with __name__, else __module__.__name__.

There are several (overlapping) cases:

The class comes from C and was never exported.
The class comes from Python but was explicitly deleted (or overwritten? Don't Python-originated types get GC'ed if unused nowadays?).
The class is available under some other name, e.g. under types.
The class can only have a single instance.
The class is created local to some function (has .<locals>. in __qualname__), and may be created more than once.
There is more than one competing implementation and they share the same name.
There is an artificial base class that shares the same name as the subclass.
The class is intended to be used as a duck implementation of some interface.

In some cases, later python versions do add a name for the type, but since we need to support all of them, we can't rely on that.

Note that _builtin() means either __builtin__ or builtins, and _collections_abc means either _collections_abc, collections.abc, or _abccoll depending on Python version.

Here is my initial mapping, based on importing every module (except test modules and dummy implementations, and being careful with a few modules that can't be imported in the wrong order) in the standard library (for CPython 2.7 and 3.2 though 3.5), checking every module for top-level attributes, and then finally checking object.__subclasses__. This is actually quite fast, just a few seconds.

type_rename = {
    '<AttributeError>.xxlimited.Null': 'xxlimited.Null',
    '<AttributeError>.xxlimited.Str': 'xxlimited.Str',
    '<AttributeError>.xxlimited.Xxo': '',
    '<unknown>.Purpose': 'ssl.Purpose',
    '__builtin__.BlockingIOError': '_io.BlockingIOError',
    '__builtin__.CArgObject': '',
    '__builtin__.DB': '',
    '__builtin__.DBCursor': '',
    '__builtin__.DBEnv': '',
    '__builtin__.DBLock': '',
    '__builtin__.DBLogCursor': '',
    '__builtin__.DBSequence': '',
    '__builtin__.DBSite': '',
    '__builtin__.DBTxn': '',
    '__builtin__.Element': '',
    '__builtin__.ElementTree': '_elementtree.ElementTree',
    '__builtin__.EncodingMap': '',
    '__builtin__.MultibyteCodec': '',
    '__builtin__.MultibyteIncrementalDecoder': '_multibytecodec.MultibyteIncrementalDecoder',
    '__builtin__.MultibyteIncrementalEncoder': '_multibytecodec.MultibyteIncrementalEncoder',
    '__builtin__.MultibyteStreamReader': '_multibytecodec.MultibyteStreamReader',
    '__builtin__.MultibyteStreamWriter': '_multibytecodec.MultibyteStreamWriter',
    '__builtin__.NoneType': 'types.NoneType',
    '__builtin__.NotImplementedType': 'types.NotImplementedType',
    '__builtin__.PyCapsule': '',
    '__builtin__.StgDict': '',
    '__builtin__.Struct': '_struct.Struct',
    '__builtin__.builtin_function_or_method': 'types.BuiltinFunctionType',
    '__builtin__.callable-iterator': '',
    '__builtin__.cell': '',
    '__builtin__.classobj': 'types.ClassType',
    '__builtin__.code': 'types.CodeType',
    '__builtin__.deque_iterator': '',
    '__builtin__.deque_reverse_iterator': '',
    '__builtin__.dict_items': '',
    '__builtin__.dict_keys': '',
    '__builtin__.dict_values': '',
    '__builtin__.dictionary-keyiterator': '',
    '__builtin__.dictproxy': 'types.DictProxyType',
    '__builtin__.ellipsis': 'types.EllipsisType',
    '__builtin__.fieldnameiterator': '',
    '__builtin__.formatteriterator': '',
    '__builtin__.frame': 'types.FrameType',
    '__builtin__.function': 'types.FunctionType',
    '__builtin__.generator': 'types.GeneratorType',
    '__builtin__.getset_descriptor': 'types.GetSetDescriptorType',
    '__builtin__.instance': 'types.InstanceType',
    '__builtin__.instancemethod': 'types.MethodType',
    '__builtin__.iterator': '',
    '__builtin__.iterparse': '_elementtree.iterparse',
    '__builtin__.member_descriptor': 'types.MemberDescriptorType',
    '__builtin__.method-wrapper': '',
    '__builtin__.method_descriptor': '',
    '__builtin__.module': 'types.ModuleType',
    '__builtin__.sqlite3Node': '',
    '__builtin__.symtable entry': '',
    '__builtin__.test_structmembersType': '_testcapi._test_structmembersType',
    '__builtin__.tkapp': '_tkinter.TkappType',
    '__builtin__.tktimertoken': '_tkinter.TkttType',
    '__builtin__.traceback': 'types.TracebackType',
    '__builtin__.weakcallableproxy': '_weakref.CallableProxyType',
    '__builtin__.weakproxy': '_weakref.ProxyType',
    '__builtin__.weakref': '_weakref.ReferenceType',
    '__builtin__.wrapper_descriptor': '',
    '_csv.reader': '',
    '_csv.writer': '',
    '_ctypes.CField': '',
    '_ctypes.CThunkObject': '',
    '_ctypes.DictRemover': '',
    '_ctypes.PyCArrayType': '_ctypes.Array.__class__',
    '_ctypes.PyCFuncPtr': '_ctypes.CFuncPtr',
    '_ctypes.PyCFuncPtrType': '_ctypes.CFuncPtr.__class__',
    '_ctypes.PyCPointerType': '_ctypes._Pointer.__class__',
    '_ctypes.PyCSimpleType': '_ctypes._SimpleCData.__class__',
    '_ctypes.PyCStructType': '_ctypes.Structure.__class__',
    '_ctypes.UnionType': '_ctypes.Union.__class__',
    '_ctypes._CData': '',
    '_curses.curses window': '',
    '_curses_panel.curses panel': '',
    '_dbm.dbm': '',
    '_elementtree._element_iterator': '',
    '_gdbm.gdbm': '',
    '_hashlib.HASH': '',
    '_io._BytesIOBuffer': '',
    '_json.Encoder': '_json.make_encoder',
    '_json.Scanner': '_json.make_scanner',
    '_pickle.Pdata': '',
    '_pickle.PicklerMemoProxy': '',
    '_pickle.UnpicklerMemoProxy': '',
    '_sre.SRE_Match': '',
    '_sre.SRE_Pattern': 're._pattern_type',
    '_sre.SRE_Scanner': '',
    '_thread._localdummy': '',
    '_thread.lock': '_thread.LockType',
    '_tkinter.tkapp': '_tkinter.TkappType',
    '_tkinter.tktimertoken': '_tkinter.TkttType',
    'abc.SignalDict': '',
    'anydbm.error': '',
    'argparse._ChoicesPseudoAction': '',
    'argparse._Section': '',
    'builtins.BlockingIOError': '_io.BlockingIOError',
    'builtins.CArgObject': '',
    'builtins.CommentProxy': '',
    'builtins.Element': '',
    'builtins.ElementTree': '_elementtree.ElementTree',
    'builtins.EncodingMap': '',
    'builtins.MultibyteCodec': '',
    'builtins.MultibyteIncrementalDecoder': '_multibytecodec.MultibyteIncrementalDecoder',
    'builtins.MultibyteIncrementalEncoder': '_multibytecodec.MultibyteIncrementalEncoder',
    'builtins.MultibyteStreamReader': '_multibytecodec.MultibyteStreamReader',
    'builtins.MultibyteStreamWriter': '_multibytecodec.MultibyteStreamWriter',
    'builtins.NoneType': 'builtins.None.__class__',
    'builtins.NotImplementedType': 'builtins.NotImplemented.__class__',
    'builtins.PIProxy': '',
    'builtins.PyCapsule': '',
    'builtins.StgDict': '',
    'builtins.Struct': '_struct.Struct',
    'builtins.TreeBuilder': '',
    'builtins.XMLParser': '',
    'builtins.builtin_function_or_method': 'types.BuiltinFunctionType',
    'builtins.bytearray_iterator': _collections_abc('bytearray_iterator'),
    'builtins.bytes_iterator': _collections_abc('bytes_iterator'),
    'builtins.callable_iterator': '',
    'builtins.cell': '',
    'builtins.classmethod_descriptor': '', # inspect._ClassMethodWrapper
    'builtins.code': 'types.CodeType',
    'builtins.coroutine': 'types.CoroutineType',
    'builtins.coroutine_wrapper': '',
    'builtins.deque_iterator': '',
    'builtins.deque_reverse_iterator': '',
    'builtins.dict_itemiterator': _collections_abc('dict_itemiterator'),
    'builtins.dict_items': _collections_abc('dict_items'),
    'builtins.dict_keyiterator': _collections_abc('dict_keyiterator'),
    'builtins.dict_keys': _collections_abc('dict_keys'),
    'builtins.dict_proxy': '_abcoll.dict_proxy',
    'builtins.dict_valueiterator': _collections_abc('dict_valueiterator'),
    'builtins.dict_values': _collections_abc('dict_values'),
    'builtins.ellipsis': 'builtins.Ellipsis.__class__',
    'builtins.fieldnameiterator': '',
    'builtins.formatteriterator': '',
    'builtins.frame': 'types.FrameType',
    'builtins.function': 'types.FunctionType',
    'builtins.generator': 'types.GeneratorType',
    'builtins.getset_descriptor': 'types.GetSetDescriptorType',
    'builtins.instancemethod': '',
    'builtins.iterator': '',
    'builtins.iterparse': '_elementtree.iterparse',
    'builtins.list_iterator': _collections_abc('list_iterator'),
    'builtins.list_reverseiterator': _collections_abc('list_reverseiterator'),
    'builtins.longrange_iterator': '',
    'builtins.managedbuffer': '',
    'builtins.mappingproxy': 'types.MappingProxyType',
    'builtins.member_descriptor': 'types.MemberDescriptorType',
    'builtins.method': 'types.MethodType',
    'builtins.method-wrapper': '', # later 'inspect._MethodWrapper'
    'builtins.method_descriptor': '',
    'builtins.module': 'types.ModuleType',
    'builtins.moduledef': '',
    'builtins.namespace': 'types.SimpleNamespace',
    'builtins.odict_items': '',
    'builtins.odict_iterator': '',
    'builtins.odict_keys': '',
    'builtins.odict_values': '',
    'builtins.range_iterator': _collections_abc('range_iterator'),
    'builtins.set_iterator': _collections_abc('set_iterator'),
    'builtins.sqlite3Node': '',
    'builtins.stderrprinter': '',
    'builtins.str_iterator': _collections_abc('str_iterator'),
    'builtins.symtable entry': '',
    'builtins.traceback': 'types.TracebackType',
    'builtins.tuple_iterator': _collections_abc('tuple_iterator'),
    'builtins.weakcallableproxy': '_weakref.CallableProxyType',
    'builtins.weakproxy': '_weakref.ProxyType',
    'builtins.weakref': '_weakref.ReferenceType',
    'builtins.wrapper_descriptor': '', # later 'inspect._WrapperDescriptor'
    'cElementTree.ParseError': '_elementtree.ParseError',
    'cPickle.Pickler': '',
    'cPickle.Unpickler': '',
    'cStringIO.StringI': 'cStringIO.InputType',
    'cStringIO.StringO': 'cStringIO.OutputType',
    'crypt._Method': '',
    'ctypes.CDLL.__init__.<locals>._FuncPtr': '',
    'ctypes.CFUNCTYPE.<locals>.CFunctionType': '',
    'ctypes.CFunctionType': '',
    'ctypes.LP_c_char': '',
    'ctypes.LP_c_wchar': '',
    'ctypes.PYFUNCTYPE.<locals>.CFunctionType': '',
    'ctypes._FuncPtr': '',
    'ctypes.c_double_be': '',
    'ctypes.c_float_be': '',
    'ctypes.c_int_be': '',
    'ctypes.c_long_be': '',
    'ctypes.c_short_be': '',
    'ctypes.c_uint_be': '',
    'ctypes.c_ulong_be': '',
    'ctypes.c_ushort_be': '',
    'datetime.date': '',
    'datetime.datetime': '',
    'datetime.timedelta': '',
    'datetime.timezone': '',
    'datetime.tzinfo': '',
    'dbm.error': '',
    'decimal.ContextManager': '',
    'decimal.SignalDictMixin': '',
    'functools.CacheInfo': 'functools._CacheInfo',
    'imaplib.abort': '',
    'imaplib.error': '',
    'imaplib.readonly': '',
    'importlib._bootstrap.DecimalTuple': '_decimal.DecimalTuple',
    'itertools._grouper': '',
    'itertools.tee': '',
    'itertools.tee_dataobject': '',
    'multiprocessing.managers.PoolProxy': '',  # base class
    'multiprocessing.process._MainProcess': 'multiprocessing.process._current_process.__class__',
    'ossaudiodev.oss_audio_device': '',
    'ossaudiodev.oss_mixer_device': '',
    'parser.st': 'parser.ASTType' if PY2 else 'parser.STType',
    'pkg_resources._vendor.packaging._structures.Infinity': 'pkg_resources._vendor.packaging._structures.Infinity.__class__',
    'pkg_resources._vendor.packaging._structures.NegativeInfinity': 'pkg_resources._vendor.packaging._structures.NegativeInfinity.__class__',
    'pkg_resources.manifest_mod': '',
    'posix.DirEntry': '',
    'posix.ScandirIterator': '',
    'profile.fake_code': '',
    'profile.fake_frame': '',
    'pyexpat.xmlparser': 'pyexpat.XMLParserType',
    'sched.Event': '',
    'select.poll': '',
    'shutil.usage': 'shutil._ntuple_diskusage',
    'site.Quitter': _builtin('exit.__class__'),
    'site.setquit.<locals>.Quitter': 'builtins.exit.__class__',
    'ssl._ASN1Object': '',
    'sys.flags': 'sys.flags.__class__',
    'sys.float_info': 'sys.float_info.__class__',
    'sys.hash_info': 'sys.hash_info.__class__',
    'sys.int_info': 'sys.int_info.__class__',
    'sys.long_info': 'sys.long_info.__class__',
    'sys.thread_info': 'sys.thread_info.__class__',
    'sys.version_info': 'sys.version_info.__class__',
    'thread.lock': 'thread.LockType',
    'tokenize.TokenInfo': '',
    'typing.typing.io': 'typing.io',
    'typing.typing.re': 'typing.re',
    'unittest.util.Mismatch': 'unittest.util._Mismatch',
    'urllib.parse.DefragResult': '',  # base class
    'urllib.parse.ParseResult': '',  # base class
    'urllib.parse.SplitResult': '',  # base class
    'urlparse.ParseResult': '',  # base class
    'urlparse.SplitResult': '',  # base class
    'xml.dom.xmlbuilder._AsyncDeprecatedProperty': '',
    'xml.etree.ElementTree.Element': '',  # multiple implementations
    'xml.etree.ElementTree.ElementTree': '',
    'xxlimited.Xxo': '',
    'zlib.Compress': '',
    'zlib.Decompress': '',
}

Notes:

Python2 and Python3 module names were deliberately not collapsed in keys, since we might have to take different strategies.
Some of the values might represent internal names, so we should treat them as not existing.
If a class is not present, that means either it's not in the standard library, or else it is available under its declared name. Classes in well-known third-party packages should be added to the list.
If there is a simple key-value pair, that means the real name was always found.
If the value ends with .__class__, that means that there is a single canonical instance.
If the value is '' and there is a comment, see it.
If the value is '' otherwise, I could not locate a canonical location for the class.

My thought is to add a stubtool.types module similar to types and ask people to use it where appropriate. I think we do need to supply classes like bytearray_iterator, even if most stubs will just use Iterator[int] (though we do need to do something about __length_hint__() in general).

Feedback?

matthiaskramm commented 9 years ago

Thanks for that detailed summary! Yes, we definitely have a lot of "hidden" types that, nonetheless, need to be exposed in the .pyi of a module.

So far, the standard strategy seems to be to add a leading underscore and a comment, e.g. in _codecs.pyi:

# Not exposed. In Python 2, this is defined in unicode.c:
class _EncodingMap(object):
    def size(self) -> int: ...

I wonder whether PEP 484 should have some special syntax to hide things more explicitly. A class decorator, perhaps. But on the other hand, Python already has a convention (prefix "_") for marking things as private, and there's no really good reason for reinventing the wheel.

I vote for always clearly marking them as "not exposed", though, to avoid confusion.

Also, I'm not against sticking these special types into a file of their own (stubtool.types or similar), but we should only do it for types that are actually used by more than one stub. If a type is only used locally in a single module, that's where it should go.

o11c commented 9 years ago

For manually written stubs that are only used by checkers, the underscore rule makes sense. But remember typing.get_type_hints (and also tools that generate stubs).

I like the decorator approach - perhaps @typing.deleted (since the effect is that of class Foo: pass; del Foo) - but what exactly does it mean? We need to decide who can see the name then. Perhaps "when checking a .pyi file, allow full access; in a .py file, forbid from ... import ... and forbid mod.foo unless it is stringified"? But that seems complicated, which is bad. And .pyi files aren't used at all for typing.get_type_hints at all ...

dckc commented 7 years ago

I'm quite surprised to find no return type declared for csv.writer.

JelleZijlstra commented 7 years ago

I think there's nothing actionable here at this point. We've been handling types like this by using leading underscores and I don't think that has caused problems. As we develop tools like https://github.com/JelleZijlstra/stubcheck, we'll be able to find any types that don't exist at runtime but that have crept into the stubs.

python / typeshed

Types that don't exist under their inspected name #24