python / cpython

The Python programming language
https://www.python.org
Other
63.42k stars 30.37k forks source link

Split up _testcapimodule.c #93649

Open encukou opened 2 years ago

encukou commented 2 years ago

Modules/_testcapimodule.c is a nearly-8000-line behemoth with no clear structure or organization. It is getting hard to maintain.

It also doesn't work well with testing (a) feature macros that affect Python.h and (b) module initialization, so we have additional C-API testing modules: _testmultiphase, _testimportmultiple, _testinternalcapi. _testbuffer is already split out, but there are many other aspects of the API that would use a similar dedicated test suite.

We should split and combine these, ideally without polluting the namespace of top-level modules.

arhadthedev commented 2 years ago

ideally without polluting the namespace of top-level modules

Can hypothetical Lib/_testcapimodule/*.c export their objects as test.capi.*?

encukou commented 2 years ago

As far as I know they can, as long as test.capi itself is not an extension module. It would need some extra support in both make and Windows builds, though.

encukou commented 2 years ago

PR: #94549

vstinner commented 2 years ago

I tried to group de 299 attributes of the _testcapi extension module. I'm not sure about each attribute, I only looked quickly at the name, sometimes at the implementation. But it should give a coarse idea of which "groups" make sense or not.

Python/getargs.c and Python/modsupport.c (parse and build arguments) (51): * parse_tuple_and_keywords * getargs_B * getargs_C * getargs_D * getargs_H * getargs_I * getargs_K * getargs_L * getargs_S * getargs_U * getargs_Y * getargs_Z * getargs_Z_hash * getargs_b * getargs_c * getargs_d * getargs_es * getargs_es_hash * getargs_et * getargs_et_hash * getargs_f * getargs_h * getargs_i * getargs_k * getargs_keyword_only * getargs_keywords * getargs_l * getargs_n * getargs_p * getargs_positional_only_and_keywords * getargs_s * getargs_s_hash * getargs_s_hash_int * getargs_s_star * getargs_tuple * getargs_u * getargs_u_hash * getargs_w_star * getargs_y * getargs_y_hash * getargs_y_star * getargs_z * getargs_z_hash * getargs_z_star * test_L_code * test_k_code * test_s_code * test_buildvalue_N * test_buildvalue_issue38913 * test_empty_argparse * argparsing
datetime, pytime.c (33): * PyDateTime_DATE_GET * PyDateTime_DELTA_GET * PyDateTime_GET * PyDateTime_TIME_GET * PyTime_AsMicroseconds * PyTime_AsMilliseconds * PyTime_AsSecondsDouble * PyTime_AsTimespec * PyTime_AsTimespec_clamp * PyTime_AsTimeval * PyTime_AsTimeval_clamp * PyTime_FromSeconds * PyTime_FromSecondsObject * datetime_check_date * datetime_check_datetime * datetime_check_delta * datetime_check_time * datetime_check_tzinfo * get_date_fromdate * get_date_fromtimestamp * get_datetime_fromdateandtime * get_datetime_fromdateandtimeandfold * get_datetime_fromtimestamp * get_delta_fromdsu * get_time_fromtime * get_time_fromtimeandfold * get_timezone_utc_capi * get_timezones_offset_zero * make_timezones_capi * pytime_object_to_time_t * pytime_object_to_timespec * pytime_object_to_timeval * test_datetime_capi
Types, metaclasses, type inheritance (wide category, not sure if it makes sense) (30): * _test_structmembersType * awaitType * instancemethod * ipowType * create_type_from_repeated_slots * matmulType * NullTpDocType * Generic * GenericAlias * MyList * ContainerNoGC * HeapCTypeMetaclass * HeapCTypeMetaclassCustomNew * HeapCTypeSetattr * HeapCTypeSubclass * HeapCTypeSubclassWithFinalizer * HeapCTypeWithBuffer * HeapCTypeWithDict * HeapCTypeWithNegativeDict * HeapCTypeWithWeakref * HeapDocCType * HeapGcCType * test_from_spec_invalid_metatype_inheritance * pytype_fromspec_meta * test_from_spec_metatype_inheritance * test_get_statictype_slots * test_get_type_name * test_get_type_qualname * test_lazy_hash_inheritance * test_type_from_ephemeral_spec
Function calls and spawn threads to call functions (25): * _test_thread_state * call_in_temporary_c_thread * get_kwargs * get_args * create_cfunction * pyobject_vectorcall * pyvectorcall_call * pyobject_fastcall * pyobject_fastcalldict * meth_fastcall * meth_fastcall_keywords * meth_noargs * meth_o * meth_varargs * meth_varargs_keywords * MethClass * MethInstance * MethStatic * MethodDescriptor2 * MethodDescriptorBase * MethodDescriptorDerived * MethodDescriptorNopGet * return_null_without_error * return_result_with_error * stack_pointer
Unicode, codecs (12): * codec_incrementaldecoder * codec_incrementalencoder * test_unicode_compare_with_ascii * test_widechar * test_string_from_format * unicode_asucs4 * unicode_asutf8 * unicode_asutf8andsize * unicode_aswidechar * unicode_aswidecharstring * unicode_copycharacters * unicode_findchar
Memory allocators (17): * WITH_PYMALLOC * pyobject_malloc_without_gil * pymem_api_misuse * pymem_buffer_overflow * pymem_getallocatorsname * pymem_malloc_without_gil * test_pymem_alloc0 * test_pymem_setallocators * test_pymem_setrawallocators * test_pyobject_new * test_pyobject_setallocators * set_nomemory * check_pyobject_forbidden_bytes_is_freed * check_pyobject_freed_is_freed * check_pyobject_null_is_freed * check_pyobject_uninitialized_is_freed * remove_mem_hooks
Exceptions, "error handling", C "errno" variable and signals (13): * RecursingInfinitelyError * raise_memoryerror * make_exception_with_doc * write_unraisable_exc * traceback_print * set_exception * raise_exception * set_exc_info * exception_print * fatal_error * error * set_errno * raise_SIGINT_then_send_None
C types limit and size (20): * CHAR_MAX * CHAR_MIN * INT_MAX * INT_MIN * LLONG_MAX * LLONG_MIN * LONG_MAX * LONG_MIN * PY_SSIZE_T_MAX * PY_SSIZE_T_MIN * SHRT_MAX * SHRT_MIN * UCHAR_MAX * UINT_MAX * ULLONG_MAX * ULONG_MAX * USHRT_MAX * SIZEOF_TIME_T * test_config * test_sizeof_c_types
Float, PyLongObject and PyNumber C API (16): * test_long_and_overflow * test_long_api * test_long_as_size_t * test_long_as_unsigned_long_long_mask * test_long_long_and_overflow * test_long_numbits * test_longlong_api * pynumber_tobase * DBL_MAX * DBL_MIN * FLT_MAX * FLT_MIN * float_pack * float_unpack * test_string_to_double * test_long_as_double
Docstring (10): * docstring_empty * docstring_no_signature * docstring_with_invalid_signature * docstring_with_invalid_signature2 * docstring_with_signature * docstring_with_signature_and_extra_newlines * docstring_with_signature_but_no_doc * docstring_with_signature_with_defaults * no_docstring * test_with_docstring
Garbage collector (3): * test_gc_control * without_gc * with_tp_del
Tracemalloc (3): * tracemalloc_get_traceback * tracemalloc_track * tracemalloc_untrack
os module helpers (1): * W_STOPCODE
Subinterpreters (1): * run_in_subinterp
C API (57): * PyObject and PyVarObject C API (9): * negative_refcount * test_decref_doesnt_leak * test_incref_decref_API * test_incref_doesnt_leak * test_refcount_funcs * test_refcount_macros * test_xdecref_doesnt_leak * test_xincref_doesnt_leak * test_set_type_size * PyObject_Bytes(), PyObject_Repr(), PyObject_Str() (3): * pyobject_bytes_from_null * pyobject_repr_from_null * pyobject_str_from_null * PyTypeObject C API (2): * negative_dictoffset * type_get_version * C API macros (3): * test_macros * test_py_is_funcs * test_py_is_macros * PyCodeObject C API (2): * test_code_api * code_newempty * PyFrameObject C API (5): * frame_getbuiltins * frame_getgenerator * frame_getglobals * frame_getlasti * frame_getlocals * PyDictObject C API (3): * dict_get_version * dict_getitem_knownhash * test_dict_iteration * PyThreadState C API (2): * test_tstate_capi * crash_no_current_thread * PyListObject C API (1): * test_list_api * PySequence and PyMapping C API, PyObject_GetItem() (6): * sequence_getitem * get_mapping_items * get_mapping_keys * get_mapping_values * getitem_with_error * bad_get * C API "feature" macros like HAVE_FORK (1): * get_feature_macros * HAMT C API (1): * hamt * PyBuffer and memoryview C API (5): * getbuffer_with_null_view * make_memoryview_from_NULL_pointer * test_from_contiguous * PyBuffer_SizeFromFormat * test_pep3118_obsolete_write_locks * PyCapsule C API (1): * test_capsule * PyThread_tss C API (1): * test_pythread_tss_key_state * PyStructSeq C API (2): * test_structseq_newtype_doesnt_leak * test_structseq_newtype_null_descr_doc * PyEval_SetTrace C API (1): * settrace_to_record * PyMarshal C API (6): * pymarshal_read_last_object_from_file * pymarshal_read_long_from_file * pymarshal_read_object_from_file * pymarshal_read_short_from_file * pymarshal_write_long_to_file * pymarshal_write_object_to_file * Py_AddPendingCall C API (1): * _pending_threadfunc * Misc C API (2): * Py_Version * Py_CompileString
Misc (7): * ``__doc__`` * ``__file__`` * ``__loader__`` * ``__name__`` * ``__package__`` * ``__spec__`` * the_number_three
vstinner commented 2 years ago

It might be interesting to split the large _testcapi module into multiple modules. But I like the idea of starting by splitting the long C file into multiple C files.

tiran commented 2 years ago

You do not need to include _testcapi/vectorcall.c in MODULE__TESTCAPI_DEPS. It is already listed as dependency for _testcapi in the Setup file.

encukou commented 2 years ago

Thank you for the fix! I'll :hammer: test-with-buildbots next time I touch the buildsystem. (I wonder why this issue didn't get the broken buildbot notification, though...)

tiran commented 2 years ago

Thank you for the fix! I'll hammer test-with-buildbots next time I touch the buildsystem. (I wonder why this issue didn't get the broken buildbot notification, though...)

We think that there was a problem in the buildbot hook, https://github.com/python/buildmaster-config/issues/333

philg314 commented 2 years ago

How should raiseTestError be handled? https://github.com/python/cpython/blob/c7e5bbaee88a71dc6e633e3cd451ed1798436382/Modules/_testcapimodule.c#L52-L62

Edit: For the Unicode tests I copied it and renamed it to _testcapi.unicode_error.

encukou commented 2 years ago

I'd switch to the built-in AssertionError, unless it's specifically a test for custom exceptions.

encukou commented 2 years ago

https://github.com/python/devguide/pull/936 proposes a Devguide update on C API tests.

encukou commented 2 years ago

https://github.com/python/cpython/pull/8551 started a similar reorganization for the Python code.

erlend-aasland commented 2 years ago

New "watcher" APIs have been added recently (and some are soon to land):

IMO, we could extract the watcher APIs to a separate test file under Modules/_testcapi/. (I should've thought about that while reviewing those PRs.)

vstinner commented 2 years ago

In moved PyFrameObject C API tests from the big test_capi to the end of test_frame. But since test_capi.py became a package, maybe it would make sense to move back these tests there? I don't know. test_unicode and many other tests have tests on their C API. Should we put all C API tests to test_capi just because they are implemented in C, or is it better to move C API tests is each test_xxx file?

vstinner commented 2 years ago

They are tons of test files which use _testcapi:

$ (cd Lib/test/; grep -l _testcapi *.py)
audit-tests.py
datetimetester.py
pythoninfo.py
string_tests.py
test_array.py
test_buffer.py
test_bytes.py
test_call.py
test_class.py
test_cmath.py
test_cmd_line.py
test_codecs.py
test_code.py
test_context.py
test_coroutines.py
test_csv.py
test_decimal.py
test_descr.py
test_devpoll.py
test_dict.py
test_dict_version.py
test_embed.py
test_exceptions.py
test_faulthandler.py
test_fcntl.py
test_fileio.py
test_fileutils.py
test_finalization.py
test_float.py
test_format.py
test_frame.py
test_gc.py
test_gdb.py
test_generators.py
test_genericclass.py
test_getpath.py
test_inspect.py
test_marshal.py
test_multibytecodec.py
test_os.py
test_poll.py
test_posix.py
test_regrtest.py
test_repl.py
test_signal.py
test_socket.py
test_stable_abi_ctypes.py
test_struct.py
test_subprocess.py
test_sys.py
test_sys_settrace.py
test_tcl.py
test_threading.py
test_time.py
test_traceback.py
test_tracemalloc.py
test_type_cache.py
test_ucn.py
test_unicode.py
test_weakref.py

I prefer to have Unicode C API tests in test_unicode, PyFrameObject C API tests in test_frame, etc. But I'm not 100% sure if it's the best :-)

Should test_capi become a monster, since our C API is a monster? :-D

erlend-aasland commented 2 years ago

Should test_capi become a monster, since our C API is a monster? :-D

I believe test_capi is a monster, hence this issue ;) I'm not sure what is the final perfect structure here, but as Petr said earlier in this issue (and to which I also concur), splitting up _testcapimodule.c is a step in the right direction.

vstinner commented 2 years ago

splitting up _testcapimodule.c is a step in the right direction.

Oh sure, I totally agree on that! Thanks for working on it.

encukou commented 2 years ago

I prefer to have Unicode C API tests in test_unicode, PyFrameObject C API tests in test_frame, etc. But I'm not 100% sure if it's the best :-)

That makes a lot of sense!

There's still plenty of things that belong it test_capi, though – module init, heap/static types, PyMemberDef, PyStructSequence, PyErr_Clear, PyCapsule, PyArg, ...

erlend-aasland commented 2 years ago

There's still plenty of things that belong it test_capi, though – module init, heap/static types, PyMemberDef, PyStructSequence, PyErr_Clear, PyCapsule, PyArg, ...

BTW, I'm working on a PR for PyArg (the getarg and modsupport stuff in Victor's earlier post).

vstinner commented 1 year ago

There's still plenty of things that belong it test_capi, though – module init, heap/static types, PyMemberDef, PyStructSequence, PyErr_Clear, PyCapsule, PyArg, ...

I agree. Maybe we can write down (somewhere, in the C/Python code of test_capi, in the devguide?) a list of tests which belong to test_capi file, and which tests should be done in more specific test files? Like a general guideline.

vstinner commented 1 year ago

Maybe my comment is just outdated. Ignore it if it's the case :-)

I don't understand compiler warnings on the old PR #95793 (merged last August):

Capture d’écran du 2022-11-14 23-05-05

These warnings should only occur when assert() statements are removed by the compiler. Otherwise, these two variables are used.

And the purpose of this PR is to make sure that assertions are not removed. Does it mean that NDEBUG is undefined too late?

In Modules/_testcapimodule.c, I undefined the macro before the first include. In Modules/_testcapi/parts.h, it's undefined after including pyconfig.h. It seems ok to me.

Sadly, Address Sanitizer logs of this PR are no longer available:

The logs for this run have expired and are no longer available.

.github/workflows/build.yml currently builds Python with:

    - name: Configure CPython
      run: ./configure --with-address-sanitizer --without-pymalloc
    - name: Build CPython
      run: make -j4
    - name: Display build info
      run: make pythoninfo

On a recent job, I see: CC.version: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 in test.pythoninfo.

vstinner commented 1 year ago

Is it time to close issue #78453 to continue the work here? cc @serhiy-storchaka

serhiy-storchaka commented 1 year ago

I moved PyUnicode C API tests from test_unicode to test_capi.test_unicode, because I am going to cover by test all C API, and I think that such large chunk of test is better in a separate file. test_unicode is large even without C API tests.

vstinner commented 1 year ago
$ grep -c PyAPI_FUNC Include/unicodeobject.h Include/cpython/unicodeobject.h 
Include/unicodeobject.h:83
Include/cpython/unicodeobject.h:69

There are around 152 "PyUnicode" C API functions. That's a lot knowing that Python 3.12 exports 939 public functions (and 354 private functions): stats on the C API.

$ wc -l Lib/test/test_{codecs,unicode}.py 
  3560 Lib/test/test_codecs.py
  2701 Lib/test/test_unicode.py
  6261 total

Currently, test_codecs is around 3 500 lines and test_unicode around 2 700 lines.

If we want to extend the coverage of the PyUnicode C API (what is being discussed here :-)), I agree that moving the Python parts of PyUnicode C API tests can be the newly added Lib/test/test_capi/test_unicode.py.

vstinner commented 1 year ago

In PR #99613, I proposed to @serhiy-storchaka to separate "codecs" tests from "unicode" tests. For me, "codecs" includes PyCodec C API but also "encode" and "decode" functions of the PyUnicode C API. What do you think?

serhiy-storchaka commented 1 year ago

I think this issue and issue #78453 are different. This issue is about reorganization of the C code, and #78453 is more about reorganization of the Python code.

vstinner commented 1 year ago

_testcapi extension was splitted into sub-tests: multiple extensions and test_capi package with multiple tests.

The issue can now be closed, no?

encukou commented 1 year ago

I wouldn't close it -- the module still has a lot of tests that can be split out.

erlend-aasland commented 1 year ago

I agree, there are still useful refactors to be made.

vstinner commented 7 months ago

See also issue gh-78453.

vstinner commented 7 months ago

I created a _testlimitedcapi extension to test the C API via the limited C API: see issue gh-116417 (completed).