Open chrstphrchvz opened 10 months ago
See also http://maskray.me/blog/2022-12-18-control-flow-integrity#fsanitizefunction by @MaskRay for more background.
This is one example where the recent trend of thinking we should use specific types on our C API is not compatible with parts of our design. Things like your proposed change are likely good. I'd like to avoid gaining new -f
flags to the compiler if possible. ABI considerations are something that could force the compiler flag issue.
In this case it's odd that this counts as undefined behavior. Reality: A pointer is always going to be a pointer regardless of what type it points to - one machine word. The standard is deficient.
A pointer is always going to be a pointer regardless of what type it points to - one machine word. The standard is deficient.
note that the standard requires that all pointers are of the same finite size, including FPs
the recent trend of thinking we should use specific types on our C API
That trend, as I understand it, is unrelated, or only tangentially related. It doesn't involve casting function types; and the discussions around it didn't involve type definition. Please don't conflate it with this issue.
My comment is more about seeing a function using the specific type. I didn't go looking to see if it has always been that way or was a recent change. Per the source history, this one has always been defined using PyListObject.
That it causes UBSAN issues is just something to be aware of if we do want to use specific types in more places.
A pointer is always going to be a pointer regardless of what type it points to - one machine word. The standard is deficient.
note that the standard requires that all pointers are of the same finite size, including FPs
Regardless of pointers all being the same size, the compiler behavior concern might be more to do with the structure aliasing concept rather than the pointer itself. I'd have to ask llvm folks. The answer likely wouldn't change what we need to deal with.
Poking around, we've got quite a few of these PyTypeObject function pointers that are cast in order to construct the structures with functions taking more specific types. The cleanup for clang-17's new UBSAN check will touch many files. (but is otherwise a pretty mechanical change - in most cases we can just change the parameter type to be generic and add a casting assignment to a local variable of the original name within the function to "launder" the type).
Struct casting & aliasing rules say that you can safely cast PyListObject *
to PyObject *
and back, and that you can safely mutate the object using both resulting pointers.
But when calling a function through an incompatible function pointer, there is no cast. PyListObject *
and PyObject *
are simply not “compatible”.
When compilers start punishing this, it'll be bad news. Pretty much all types defined in C use this pattern – not just in CPython core, but in third-party extensions as well. It's even in the tutorial: https://docs.python.org/3/extending/newtypes_tutorial.html#adding-data-and-methods-to-the-basic-example
I would not be surprised if sooner or later I encounter an instance that is won’t-fix because it involves a stable API, or if I am told that this problem should be ignored because fixing it is too disruptive or requires disproportionate review effort
The overwhelming majority of the fixes should be simple. It's the sheer number of them that will require disproportionate review effort.
I updated our Clang UBSAN buildbot last night, clang-17. So it fails the build on this issue now. =) https://buildbot.python.org/all/#/workers/2
See also http://maskray.me/blog/2022-12-18-control-flow-integrity#fsanitizefunction by @MaskRay for more background.
Oh no, this issue is related to Control Flow Integrity (CFI). I think that this problem should be treated seriously, more and more companies are enforcing CFI. It's better if Python can be adapted to respect CFI as soon as possible. CFI even started to be implemented at the hardware (CPU) level. The Linux kernel does its best to implement CFI to harden the kernel.
Recently, I added the _PyCFunction_CAST()
macro with a long comment:
// Cast an function to the PyCFunction type to use it with PyMethodDef.
//
// This macro can be used to prevent compiler warnings if the first parameter
// uses a different pointer type than PyObject* (ex: METH_VARARGS and METH_O
// calling conventions).
//
// The macro can also be used for METH_FASTCALL and METH_VARARGS|METH_KEYWORDS
// calling conventions to avoid compiler warnings because the function has more
// than 2 parameters. The macro first casts the function to the
// "void func(void)" type to prevent compiler warnings.
//
// If a function is declared with the METH_NOARGS calling convention, it must
// have 2 parameters. Since the second parameter is unused, Py_UNUSED() can be
// used to prevent a compiler warning. If the function has a single parameter,
// it triggers an undefined behavior when Python calls it with 2 parameters
// (bpo-33012).
#define _PyCFunction_CAST(func) \
_Py_CAST(PyCFunction, _Py_CAST(void(*)(void), (func)))
The problem is that PyMethodDef.ml_meth
member type is:
typedef PyObject *(*PyCFunction)(PyObject *, PyObject *);
whereas Python uses calling convention with more or less parameters than PyCFunction
or with other parameter types :-(
I think that we should decide a global strategy for this problem, rather than trying the whack-a-mole game with various strategies. It's a problem as bad a Python 2 PyObject structure inheritance which was not compatible with strict aliasing. This problem was solved with a new API, especially this macro:
#define PyObject_HEAD PyObject ob_base;
@encukou:
Struct casting & aliasing rules say that you can safely cast PyListObject to PyObject and back, and that you can safely mutate the object using both resulting pointers.
I have a different experience with casting from/to PyObject*
. I discovered "type punning" issue the hard way in https://github.com/python/cpython/issues/98724
I have opened a couple of PRs for a few examples of this issue. I intended to separate them by codeowner(s), if that is helpful.
Some instances of this issue lead to far more errors than others; visit_reachable()
seems to be the most common.
I am not aware how instances involving generated code should be handled:
Objects/listobject.c:2911:5: warning: cast from 'PyObject *(*)(PyListObject *, PyObject *)' (aka 'struct _object *(*)(PyListObject *, struct _object *)') to 'PyCFunction' (aka 'struct _object *(*)(struct _object *, struct _object *)') converts to incompatible function type [-Wcast-function-type-strict]
2911 | PY_LIST_EXTEND_METHODDEF
| ^~~~~~~~~~~~~~~~~~~~~~~~
Objects/clinic/listobject.c.h:105:16: note: expanded from macro 'PY_LIST_EXTEND_METHODDEF'
105 | {"extend", (PyCFunction)py_list_extend, METH_O, py_list_extend__doc__},
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
Python/generated_cases.c.h:1626:19: runtime error: call to function py_list_extend through pointer to incorrect function type 'struct _object *(*)(struct _object *, struct _object *)'
listobject.c:1025: note: py_list_extend defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Python/generated_cases.c.h:1626:19 in
I think that we should decide a global strategy for this problem, rather than trying the whack-a-mole game with various strategies.
Related to this point: how likely is it that instances of this issue will keep being introduced or reintroduced, at least without greater awareness and acceptance of the relevant C rules? How much does having CI run with -fsanitize=function
or even -Werror=cast-function-type
help prevent this?
how likely is it that instances of this issue will keep being introduced or reintroduced, at least without greater awareness and acceptance of the relevant C rules? How much does having CI run with -fsanitize=function or even -Werror=cast-function-type help prevent this?
It helps. Particularly once we've gotten to a stable point so that the ubsan build is not Red again. As people notice when things start failing after a change.
I believe a lot of what we do within our C code for these design patterns is cargo cult from other existing examples. So if we've cleaned things up in a consistent manner, including the code generators, to do something in a "right" way, that'll naturally become what gets done in new code.
I have opened a couple of PRs for a few examples of this issue. I intended to separate them by codeowner(s), if that is helpful.
It'll cut down on spurious GitHub notifications. I expect codeowners to be experts in the application logic much more than in C minutiæ, so they aren't necessarily the right reviewers to ping. I'd still wait for them about a week after approving PRs.
For files without codeowners, a good strategy might be to open a PR each after the old one is merged, with files that are reasonably “done” (e.g. no more function casts in the PyTypeObject
definitions). Or if some area is getting too many conflicts, single it out.
You don't need to keep PR branches up to date, if there are no conflicts -- that eats CI time.
I can't promise my availability this year, but starting January I could be your default reviewer :)
Regarding PyMethodDef
instances: what is important for this issue is whether the function in .ml_meth
is compatible with the type indicated by .ml_flags
. So I think _PyCFunction_CAST()
should not be used in cases where the function should already be compatible with PyCFunction
since that is the type it will be called as, and avoid e.g. using types other than PyObject *
for the first parameter. _PyCFunction_CAST()
still seems useful for functions which are not going to be called as PyCFunction
, although trying to use -Wcast-function-type
to find likely undefined behavior will also flag such _PyCFunction_CAST()
usage.
Example:
METH_NOARGS
means dict_sizeof()
is going to be called as PyCFunction
, so it should be made compatible with that type, and using _PyCFunction_CAST()
would then be redundant:
@@ -3704,8 +3704,9 @@ _PyDict_KeysSize(PyDictKeysObject *keys)
}
static PyObject *
-dict_sizeof(PyDictObject *mp, PyObject *Py_UNUSED(ignored))
+dict_sizeof(PyObject *self, PyObject *Py_UNUSED(ignored))
{
+ PyDictObject *mp = (PyDictObject *)self;
return PyLong_FromSsize_t(_PyDict_SizeOf(mp));
}
@@ -3769,7 +3770,7 @@ static PyMethodDef mapp_methods[] = {
DICT___CONTAINS___METHODDEF
{"__getitem__", dict_subscript, METH_O | METH_COEXIST,
getitem__doc__},
- {"__sizeof__", _PyCFunction_CAST(dict_sizeof), METH_NOARGS,
+ {"__sizeof__", dict_sizeof, METH_NOARGS,
sizeof__doc__},
DICT_GET_METHODDEF
DICT_SETDEFAULT_METHODDEF
Another example:
METH_FASTCALL
means that although mappingproxy_get()
is not going to be called as PyCFunction
, and so using _PyCFunction_CAST()
seems fine, it is eventually called as _PyCFunctionFast
and should be made compatible with that type:
--- a/Objects/descrobject.c
+++ b/Objects/descrobject.c
@@ -1107,8 +1107,9 @@ static PySequenceMethods mappingproxy_as_sequence = {
};
static PyObject *
-mappingproxy_get(mappingproxyobject *pp, PyObject *const *args, Py_ssize_t nargs)
+mappingproxy_get(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
{
+ mappingproxyobject *pp = (mappingproxyobject *)self;
/* newargs: mapping, key, default=None */
PyObject *newargs[3];
newargs[0] = pp->mapping;
I've been applying “skip news” labels to the PRs, but IMO this should have a news entry -- possibly an aggregate one added in a separate PR. You're the best person to word that, but I can also add it, if you'd like.
So I think
_PyCFunction_CAST()
should not be used in cases where the function should already be compatible withPyCFunction
since that is the type it will be called as
+1
_PyCFunction_CAST()
still seems useful for functions which are not going to be called asPyCFunction
, although trying to use-Wcast-function-type
to find likely undefined behavior will also flag such_PyCFunction_CAST()
usage.
IMO, we need a better API here -- a proper tagged union. Let's leave that as the last step?
Until then, ignoring the benign warnings about _PyCFunction_CAST
seems like the best way to go.
There are some additional advantages that we can gain by doing the right modifications (apart from avoiding UBs):
By using a non-specialized type in the signature, we don't need to cast the object back to PyObject *
when calling other functions (with the correct signature). This is sometimes useful but I cannot estimate how frequent this is. On the other hand, we would only gain if the functions we are calling are indeed expecting PyObject *
objects and not specialized objects.
By using a non-specialized type, we would get rid of all (funcptr)func
casts when declaring PyTypeObject
or protocol implementations. This could make the code cleaner.
Because of the sheer amount of work to do (I just looked at what I needed to do, fixed 1-2 things and then gave up), I suggest adding a new build bot with -fsanitize=undefined -fno-sanitize=function -fsanitize-recover
. That way, we could catch the other undefined behaviours that are not due to those casts. Currently, the build bot fails fast (and we only see 1 UB per build due to the -fno-sanitize-recover
) but I think that most of the time, we won't have a lot of UBs (or else it's bad for us...). As such, having a build bot that builds entirely (with -fsanitize-recover
) instead of fast-failing would help a lot in detecting the other cases.
Once this new build bot is green, we can focus on eliminating the UBs related to function pointers.
Note: see https://github.com/python/cpython/pull/123004#issuecomment-2288611116 for what was detected as a first pass. I suspect there are UBs that are not detected but that's for another day.
IMO, you should reopen https://github.com/python/buildmaster-config/issues/517 if you're fixing that. This issue is not about the buildbot.
Mmmh. Yes, it's better. I'll re-open the other one to discuss the future of this build bot.
Bug report
Bug description:
UBSan (UndefinedBehaviorSanitizer) in LLVM.org Clang 17 makes
-fsanitize=function
available for C; previously, it was only for C++. (So it may also be made available in future Apple Xcode clang and GCC.) By default, it is implied by -fsanitize=undefined (which is what./configure --with-undefined-behavior-sanitizer
uses), but it can be disabled using -fno-sanitize=function.For a project such as CPython, which has long relied on function pointers for callbacks, yet seems to have only required that callbacks behave as expected under typical ABI calling conventions, rather than more strictly be declared/defined as a type compatible with the function pointer they will be called as, this leads to numerous errors from UBSan.
Examples when starting Python REPL:
Example workaround for the first error and likely many others, where instead of casting functions to incompatible pointers, the functions use compatible signatures and cast their parameter(s):
In other cases, it may be less disruptive to introduce a wrapper function with the correct signature:
Likely instances of this can be found at compile time using e.g.
-Wcast-function-type
(although this emits false positives for when the function pointer is cast back to the correct type before called, and this warning is suppressed by intermediate casts through(void *)
):I would be interested in combing through and replacing similar instances. But I would not be surprised if sooner or later I encounter an instance that is won’t-fix because it involves a stable API, or if I am told that this problem should be ignored because fixing it is too disruptive or requires disproportionate review effort. I am not aware how immediate any danger is from optimizing compilers exploiting this type of undefined behavior.
CPython versions tested on:
CPython main branch
Operating systems tested on:
macOS
Linked PRs