python / cpython

The Python programming language
https://www.python.org
Other
63.19k stars 30.26k forks source link

struct module has undefined behavior when loading bools #125118

Open alex opened 2 weeks ago

alex commented 2 weeks ago

Bug report

Bug description:

https://github.com/python/cpython/blob/a5fc50994a3fae46d0c3d496c4e1d5e00548a1b8/Modules/_struct.c#L489-L491

Bool values are required to be either 0 or 1, but this memcpy will copy an arbitrary value to it. This produces UBSAN reports like:


Modules/_struct.c:491:28: runtime error: load of value 32, which is not a valid value for type 'bool'
--
  | #0 0x786c2573cc3e in nu_bool cpython3/Modules/_struct.c:491:28
  | #1 0x786c2572fad0 in s_unpack_internal cpython3/Modules/_struct.c:1684:21
  | #2 0x786c2572a1f3 in unpack_impl cpython3/Modules/_struct.c:2399:12
  | #3 0x786c2572a1f3 in unpack cpython3/Modules/clinic/_struct.c.h:295:20
  | #4 0x5c0517b46548 in cfunction_vectorcall_FASTCALL cpython3/Objects/methodobject.c:436:24
  | #5 0x5c0516f89796 in _PyObject_VectorcallTstate cpython3/Include/internal/pycore_call.h:167:11
  | #6 0x5c0516f89796 in object_vacall cpython3/Objects/call.c:819:14
  | #7 0x5c0516f89c88 in PyObject_CallFunctionObjArgs cpython3/Objects/call.c:926:14
  | #8 0x5c0516f4e5c3 in fuzz_struct_unpack cpython3/Modules/_xxtestfuzz/fuzzer.c:125:26
  | #9 0x5c0516f4e5c3 in _run_fuzz cpython3/Modules/_xxtestfuzz/fuzzer.c:569:14
  | #10 0x5c0516f4e5c3 in LLVMFuzzerTestOneInput cpython3/Modules/_xxtestfuzz/fuzzer.c:639:11
  | #11 0x5c0516eb0870 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:614:13
  | #12 0x5c0516e9bae5 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:327:6
  | #13 0x5c0516ea157f in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:862:9
  | #14 0x5c0516ecc822 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
  | #15 0x786c27c3c082 in __libc_start_main /build/glibc-LcI20x/glibc-2.31/csu/libc-start.c:308:16
  | #16 0x5c0516e93ccd in _start

(https://oss-fuzz.com/testcase-detail/5186406032080896)

This should probably copy to an integer type that's the same width as _Bool.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

skirpichev commented 2 weeks ago

This should probably copy to an integer type that's the same width as _Bool.

Do you know some integer type with width in 1 bit?

The C standard says: "While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit."

I don't see something wrong here.

alex commented 2 weeks ago

There's no integer type with sub-byte width, however as you can see from that code, it parses values that are at least 1-byte (always 1-byte in reality, I assume) and treats 0s as 0s and anything else as true.

It's illegal in C to store a non 0 or 1 value in an _Bool, hence the UBSAN warning.

You can see this in things like C compiler omitting any branch premised on an _Bool having another value: https://godbolt.org/z/9hTjKMMjz

skirpichev commented 2 weeks ago

Take it another way. Why do you think, that p value is arbitrary? This is a helper for Python's bool.

How you can call one with something else than 0 or 1 in *p?

alex commented 2 weeks ago

Unless I have badly misread this code, p comes from the argument to struct.unpack, so its attacker controlled.

alex commented 2 weeks ago

Simple reproducer:

~/p/cpython ❯❯❯ ./python.exe -c "import struct; print(struct.unpack('?', b'\x03'))"
python.exe(52009,0x1edf7f240) malloc: nano zone abandoned due to inability to reserve vm space.
Modules/_struct.c:502:28: runtime error: load of value 3, which is not a valid value for type 'bool'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Modules/_struct.c:502:28 in 
(True,)
skirpichev commented 2 weeks ago

(presumably packed by pack(format, ...))

alex commented 2 weeks ago

There's no particular requirement that unpack be only used with pack-generated data, or valid data.

On Tue, Oct 8, 2024 at 1:16 PM Sergey B Kirpichev @.***> wrote:

(presumably packed by pack(format, ...))

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

-- All that is necessary for evil to succeed is for good people to do nothing.

skirpichev commented 2 weeks ago

There's no particular requirement that unpack be only used with pack-generated data

I would rather say it's a soft requirement.

FYI, pr is ready: https://github.com/python/cpython/pull/125169

alex commented 2 weeks ago

Thank you1

On Wed, Oct 9, 2024 at 12:11 AM Sergey B Kirpichev @.***> wrote:

There's no particular requirement that unpack be only used with pack-generated data

I would rather say it's rather a soft requirement.

FYI, pr is ready: #125169 https://github.com/python/cpython/pull/125169

— Reply to this email directly, view it on GitHub https://github.com/python/cpython/issues/125118#issuecomment-2401261886, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBGG2LRLEB7X6KWSOBTZ2SUIPAVCNFSM6AAAAABPSSWTPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBRGI3DCOBYGY . You are receiving this because you authored the thread.Message ID: @.***>

-- All that is necessary for evil to succeed is for good people to do nothing.

cfbolz commented 2 weeks ago

I would rather say it's a soft requirement.

I don't think it is. struct.unpack is used a lot to deal with data coming out of binary files that were written by some arbitrary other software, or from the network, or from a C library.