mdboom commented 9 months ago

Feature or enhancement

Proposal:

At a recent meeting of OpenSSF's Memory Safety SIG, I became aware of the C/C++ hardening guide they are putting together.

At a high-level, they recommend compiling with the following flags:

-O2 -Wall -Wformat=2 -Wconversion -Wtrampolines -Wimplicit-fallthrough \
-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 \
-D_GLIBCXX_ASSERTIONS \
-fstrict-flex-arrays=3 \
-fstack-clash-protection -fstack-protector-strong \
-Wl,-z,nodlopen -Wl,-z,noexecstack \
-Wl,-z,relro -Wl,-z,now \
-fPIE -pie -fPIC -shared

(-shared doesn't really make sense as a global CFLAG, so I removed it.)

When compiling on most x86 architectures (amd64, i386 and x32), add:

-fcf-protection=full

At @sethmlarson's urging, I compiled CPython on Linux/x86_64/gcc with these flags. From the complete build log, there are 3,084 warnings, but otherwise the result builds and passes all unit tests.

The warnings are of these types: (EDIT: Table updated to not double count the same line)

warning type	count
sign-conversion	2,341
conversion	595
array-bounds=	131
format-nonliteral	11
stringop-overflow=	2
float-conversion	2
stringop-overread	1
maybe-uninitialized	1
total	3,084

**Top warnings per file.**

| filename | count | | -- | --: | | ./Modules/binascii.c | 208 | | Objects/unicodeobject.c | 142 | | ./Include/internal/pycore_runtime_init.h | 128 | | Parser/parser.c | 114 | | ./Modules/_decimal/libmpdec/mpdecimal.c | 94 | | ./Modules/posixmodule.c | 85 | | ./Modules/socketmodule.c | 76 | | ./Modules/_pickle.c | 75 | | Objects/longobject.c | 65 | | ./Modules/arraymodule.c | 49 | | **total** | 3,084 |

I am not a security expert, so I don't know a good way to assess how many of these are potentially exploitable, and how many are harmless false positives. Some are probably un-resolvable (format-literal is pretty hard to avoid when wrapping sprintf, for example).

At a high level, I think the process to address these and make incremental progress maybe looks something like:

Pick one of the warning types, and assess how many false positives it gives and how onerous it is to fix them. From this, build concensus about whether it's worth addressing.
Fix all of the existing instances.
Turn that specific warning into an error so it doesn't creep back in.

But this is just to start the discussion about how to move forward.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

gh-112308
gh-120975
gh-121520
gh-121730
gh-121979
gh-122112
gh-122141
gh-122211
gh-122465
gh-122474
gh-122711
gh-122758
gh-123020

colesbury commented 9 months ago

I don't think we want -fstrict-flex-arrays=3. We need flexible array members and we need C++ support, so we're forced to rely on the (widely supported) compiler extension of using field[0] or field[1] as a flexible array member.

sobolevn commented 9 months ago

Some are probably un-resolvable (format-literal is pretty hard to avoid when wrapping sprintf, for example)

These warnings do no make much sense in current use-cases:

Objects/unicodeobject.c:2592:21: warning: format not a string literal, argument types not checked [-Wformat-nonliteral]
 2592 |                     sprintf(buffer, fmt, va_arg(*vargs, long)) :
      |                     ^~~~~~~
Objects/unicodeobject.c:2593:21: warning: format not a string literal, argument types not checked [-Wformat-nonliteral]
 2593 |                     sprintf(buffer, fmt, va_arg(*vargs, unsigned long));
      |                     ^~~~~~~
Objects/unicodeobject.c:2597:21: warning: format not a string literal, argument types not checked [-Wformat-nonliteral]
 2597 |                     sprintf(buffer, fmt, va_arg(*vargs, long long)) :
      |                     ^~~~~~~
Objects/unicodeobject.c:2598:21: warning: format not a string literal, argument types not checked [-Wformat-nonliteral]
 2598 |                     sprintf(buffer, fmt, va_arg(*vargs, unsigned long long));
      |                     ^~~~~~~
Objects/unicodeobject.c:2602:21: warning: format not a string literal, argument types not checked [-Wformat-nonliteral]
 2602 |                     sprintf(buffer, fmt, va_arg(*vargs, Py_ssize_t)) :
      |                     ^~~~~~~
Objects/unicodeobject.c:2603:21: warning: format not a string literal, argument types not checked [-Wformat-nonliteral]
 2603 |                     sprintf(buffer, fmt, va_arg(*vargs, size_t));
      |                     ^~~~~~~
Objects/unicodeobject.c:2606:17: warning: format not a string literal, argument types not checked [-Wformat-nonliteral]
 2606 |                 len = sprintf(buffer, fmt, va_arg(*vargs, ptrdiff_t));
      |                 ^~~
Objects/unicodeobject.c:2610:21: warning: format not a string literal, argument types not checked [-Wformat-nonliteral]
 2610 |                     sprintf(buffer, fmt, va_arg(*vargs, intmax_t)) :
      |                     ^~~~~~~
Objects/unicodeobject.c:2611:21: warning: format not a string literal, argument types not checked [-Wformat-nonliteral]
 2611 |                     sprintf(buffer, fmt, va_arg(*vargs, uintmax_t));
      |                     ^~~~~~~
Objects/unicodeobject.c:2615:21: warning: format not a string literal, argument types not checked [-Wformat-nonliteral]
 2615 |                     sprintf(buffer, fmt, va_arg(*vargs, int)) :
      |                     ^~~~~~~
Objects/unicodeobject.c:2616:21: warning: format not a string literal, argument types not checked [-Wformat-nonliteral]
 2616 |                     sprintf(buffer, fmt, va_arg(*vargs, unsigned int));
      |                     ^~~~~~~

I think that they should be silenced / ignored.

sethmlarson commented 9 months ago

@mdboom Are you okay with me editing your topic to create a checklist style table with links to either why we're not implementing or the actual implementation? My guess is we'll be adopting these one by one :)

mdboom commented 9 months ago

@mdboom Are you okay with me editing your topic to create a checklist style table with links to either why we're not implementing or the actual implementation? My guess is we'll be adopting these one by one :)

@sethmlarson: Good idea.

hugovk commented 9 months ago

At a high level, I think the process to address these and make incremental progress maybe looks something like:

Pick one of the warning types, and assess how many false positives it gives and how onerous it is to fix them. From this, build concensus about whether it's worth addressing.

Fix all of the existing instances.

Turn that specific warning into an error so it doesn't creep back in.

Sounds a good approach.

To share another method that could additionally help: as part of https://github.com/python/cpython/issues/101100, we're working through a lot of docs "nit-picky" warnings.

When building the docs, we only allow warnings to occur in files that already have warnings and are listed in a .nitignore file. Once a file has been cleaned, we remove it from the list to prevent regressions.

We also fail the docs build if we "accidentally" clean a file: if warnings do not occur in a file where we previously expected warnings, so the file must also be removed from the list, again to prevent regressions.

This does need some custom tooling, but it's helped us make gradual progress, and we've fixed 40% so far.

carsonRadtke commented 9 months ago

RE: @hugovk's .nitignore

I am in favor of a solution like this. It would not require any custom tooling as we could change the build arguments to whatever we find consensus in and then silence compiler warnings for offending lines until somebody comes along and fixes them.

This also allows us to silence errors locally, but enforce them globally. That way we could still have -Wformat-nonliteral, but allow non-compliance during the compilation of 'unicodeobject.c'. (I am not advocating for this flag, just using it as an example)

nohlson commented 3 months ago

Hello all I have been selected by GSoC to work on this!

@mdboom I am curious how you were able to get the unit tests to pass with the linker option -Wl,-z,nodlopen. I am testing options out on my own machine Linux/x86_64/gcc.

configure determines that because dlopen() is available that dynload_shlib.o should be linked which uses dlopen(), so later on when importing modules I understandably get an error that shared objects cannot be dlopen()ed. I'm not even aware of any alternative so I was curious how you were able to avoid dlopen() and pass tests that load modules

mdboom commented 3 months ago

Welcome, @nohlson! I was really excited to hear about this GSoC project at PyCon.

This whole investigation for me was a quick afternoon hack. I only ever got as far as getting the build to complete -- I never even ran Python, let alone its test suite. I just got as far as thinking "someone with more time should work on this", and here you are ;)

Also looking at this again, I see I didn't set the linker flags on LDFLAGS, only CFLAGS, and therefore it seems they had no effect. So there's nothing magical about it working for me and not you.

IMHO, this seems like a hard flag to support. "Python without dlopen" would be a different beast -- maybe some very security conscious people would want that, but it would require tooling to statically link in all expected extension modules (some of that tooling already exists elsewhere). So, personally I'd defer solving that one for now (but I'm not the GSoC mentor, that's just my opinion).

nohlson commented 2 months ago

I would like to get some discussion going about performance impacts of enabling options and how much we would be willing to conceed in performance for safety.

Here is an example of a cypthon baseline pyperformance benchmark vs. a build with multiple performance-impacting options enabled broke down by benchmark category:

Benchmark Tag	Geometric Mean
apps	1.03x slower
asyncio	1.04x slower
math	1.03x slower
regex	1.00x faster
serialize	1.08x slower
startup	1.03x slower
template	1.03x slower
template	1.04x slower

To see all the benchmarks run and the comparison between builds here is more detail:

Benchamrk Comparison (baseline vs. hardened build)

Benchmarks with tag 'apps': =========================== 2to3: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 306 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 314 ms +- 1 ms: 1.03x slower docutils: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 2.72 sec +- 0.01 sec -> [two_baselines_and_tldr/config_3/pyperf_output.json] 2.83 sec +- 0.02 sec: 1.04x slower html5lib: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 73.9 ms +- 0.7 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 76.8 ms +- 0.5 ms: 1.04x slower tornado_http: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 117 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 119 ms +- 2 ms: 1.02x slower Geometric mean: 1.03x slower Benchmarks with tag 'asyncio': ============================== async_tree_none: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 443 ms +- 22 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 460 ms +- 22 ms: 1.04x slower async_tree_cpu_io_mixed: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 707 ms +- 36 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 736 ms +- 37 ms: 1.04x slower async_tree_cpu_io_mixed_tg: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 666 ms +- 60 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 693 ms +- 61 ms: 1.04x slower async_tree_eager: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 137 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 146 ms +- 1 ms: 1.07x slower async_tree_eager_cpu_io_mixed: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 448 ms +- 9 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 468 ms +- 8 ms: 1.04x slower async_tree_eager_cpu_io_mixed_tg: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 394 ms +- 9 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 411 ms +- 8 ms: 1.04x slower async_tree_eager_io_tg: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 1.38 sec +- 0.07 sec -> [two_baselines_and_tldr/config_3/pyperf_output.json] 1.42 sec +- 0.07 sec: 1.03x slower async_tree_eager_memoization: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 271 ms +- 23 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 284 ms +- 23 ms: 1.05x slower async_tree_eager_memoization_tg: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 214 ms +- 8 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 222 ms +- 8 ms: 1.04x slower async_tree_eager_tg: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 96.1 ms +- 0.8 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 102 ms +- 1 ms: 1.06x slower async_tree_io: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 1.08 sec +- 0.08 sec -> [two_baselines_and_tldr/config_3/pyperf_output.json] 1.12 sec +- 0.08 sec: 1.04x slower async_tree_io_tg: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 1.12 sec +- 0.04 sec -> [two_baselines_and_tldr/config_3/pyperf_output.json] 1.16 sec +- 0.04 sec: 1.03x slower async_tree_memoization: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 567 ms +- 54 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 587 ms +- 53 ms: 1.04x slower Benchmark hidden because not significant (3): async_tree_eager_io, async_tree_memoization_tg, async_tree_none_tg Geometric mean: 1.04x slower Benchmarks with tag 'math': =========================== float: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 86.6 ms +- 0.7 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 92.2 ms +- 0.9 ms: 1.06x slower nbody: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 85.9 ms +- 1.0 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 89.2 ms +- 0.6 ms: 1.04x slower pidigits: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 171 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 170 ms +- 0 ms: 1.00x faster Geometric mean: 1.03x slower Benchmarks with tag 'regex': ============================ regex_compile: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 155 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 158 ms +- 1 ms: 1.02x slower regex_dna: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 163 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 162 ms +- 1 ms: 1.01x faster regex_effbot: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 3.04 ms +- 0.08 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 3.11 ms +- 0.07 ms: 1.02x slower regex_v8: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 26.7 ms +- 0.3 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 25.4 ms +- 0.2 ms: 1.05x faster Geometric mean: 1.00x faster Benchmarks with tag 'serialize': ================================ json_dumps: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 13.9 ms +- 0.2 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 15.6 ms +- 0.2 ms: 1.12x slower json_loads: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 31.5 us +- 0.3 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 35.6 us +- 0.6 us: 1.13x slower pickle: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 13.7 us +- 0.1 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 15.3 us +- 0.1 us: 1.11x slower pickle_dict: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 28.7 us +- 0.5 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 39.4 us +- 0.1 us: 1.37x slower pickle_list: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 4.42 us +- 0.08 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 4.23 us +- 0.13 us: 1.04x faster pickle_pure_python: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 357 us +- 3 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 370 us +- 2 us: 1.04x slower tomli_loads: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 2.48 sec +- 0.02 sec -> [two_baselines_and_tldr/config_3/pyperf_output.json] 2.63 sec +- 0.02 sec: 1.06x slower unpickle: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 18.6 us +- 0.2 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 20.3 us +- 0.3 us: 1.09x slower unpickle_list: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 5.32 us +- 0.15 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 5.93 us +- 0.15 us: 1.11x slower unpickle_pure_python: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 266 us +- 2 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 258 us +- 2 us: 1.03x faster xml_etree_parse: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 157 ms +- 2 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 164 ms +- 2 ms: 1.05x slower xml_etree_iterparse: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 106 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 113 ms +- 1 ms: 1.07x slower xml_etree_generate: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 116 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 123 ms +- 2 ms: 1.06x slower xml_etree_process: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 78.4 ms +- 0.5 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 83.2 ms +- 0.8 ms: 1.06x slower Geometric mean: 1.08x slower Benchmarks with tag 'startup': ============================== python_startup: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 10.3 ms +- 0.0 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 10.6 ms +- 0.0 ms: 1.03x slower python_startup_no_site: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 7.07 ms +- 0.02 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 7.31 ms +- 0.04 ms: 1.03x slower Geometric mean: 1.03x slower Benchmarks with tag 'template': =============================== genshi_text: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 26.9 ms +- 0.2 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 27.6 ms +- 0.2 ms: 1.02x slower genshi_xml: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 60.8 ms +- 0.5 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 62.3 ms +- 0.4 ms: 1.02x slower mako: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 12.8 ms +- 0.1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 13.3 ms +- 0.1 ms: 1.04x slower Geometric mean: 1.03x slower All benchmarks: =============== 2to3: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 306 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 314 ms +- 1 ms: 1.03x slower async_generators: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 465 ms +- 3 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 536 ms +- 4 ms: 1.15x slower async_tree_none: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 443 ms +- 22 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 460 ms +- 22 ms: 1.04x slower async_tree_cpu_io_mixed: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 707 ms +- 36 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 736 ms +- 37 ms: 1.04x slower async_tree_cpu_io_mixed_tg: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 666 ms +- 60 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 693 ms +- 61 ms: 1.04x slower async_tree_eager: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 137 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 146 ms +- 1 ms: 1.07x slower async_tree_eager_cpu_io_mixed: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 448 ms +- 9 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 468 ms +- 8 ms: 1.04x slower async_tree_eager_cpu_io_mixed_tg: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 394 ms +- 9 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 411 ms +- 8 ms: 1.04x slower async_tree_eager_io_tg: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 1.38 sec +- 0.07 sec -> [two_baselines_and_tldr/config_3/pyperf_output.json] 1.42 sec +- 0.07 sec: 1.03x slower async_tree_eager_memoization: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 271 ms +- 23 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 284 ms +- 23 ms: 1.05x slower async_tree_eager_memoization_tg: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 214 ms +- 8 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 222 ms +- 8 ms: 1.04x slower async_tree_eager_tg: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 96.1 ms +- 0.8 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 102 ms +- 1 ms: 1.06x slower async_tree_io: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 1.08 sec +- 0.08 sec -> [two_baselines_and_tldr/config_3/pyperf_output.json] 1.12 sec +- 0.08 sec: 1.04x slower async_tree_io_tg: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 1.12 sec +- 0.04 sec -> [two_baselines_and_tldr/config_3/pyperf_output.json] 1.16 sec +- 0.04 sec: 1.03x slower async_tree_memoization: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 567 ms +- 54 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 587 ms +- 53 ms: 1.04x slower asyncio_tcp_ssl: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 1.52 sec +- 0.01 sec -> [two_baselines_and_tldr/config_3/pyperf_output.json] 1.54 sec +- 0.00 sec: 1.01x slower chaos: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 71.4 ms +- 0.6 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 75.7 ms +- 1.0 ms: 1.06x slower comprehensions: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 18.5 us +- 0.1 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 19.3 us +- 0.1 us: 1.04x slower bench_mp_pool: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 20.8 ms +- 7.3 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 17.8 ms +- 5.2 ms: 1.17x faster bench_thread_pool: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 1.10 ms +- 0.01 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 1.11 ms +- 0.01 ms: 1.02x slower coroutines: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 24.9 ms +- 0.3 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 25.8 ms +- 0.3 ms: 1.04x slower coverage: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 120 ms +- 4 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 122 ms +- 3 ms: 1.01x slower crypto_pyaes: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 79.2 ms +- 1.0 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 88.2 ms +- 1.1 ms: 1.11x slower dask: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 387 ms +- 14 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 399 ms +- 12 ms: 1.03x slower deepcopy: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 432 us +- 4 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 451 us +- 3 us: 1.04x slower deepcopy_reduce: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 4.19 us +- 0.06 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 4.50 us +- 0.04 us: 1.08x slower deepcopy_memo: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 42.4 us +- 0.6 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 45.0 us +- 0.4 us: 1.06x slower deltablue: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 3.68 ms +- 0.03 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 3.98 ms +- 0.03 ms: 1.08x slower docutils: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 2.72 sec +- 0.01 sec -> [two_baselines_and_tldr/config_3/pyperf_output.json] 2.83 sec +- 0.02 sec: 1.04x slower fannkuch: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 428 ms +- 4 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 460 ms +- 4 ms: 1.08x slower float: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 86.6 ms +- 0.7 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 92.2 ms +- 0.9 ms: 1.06x slower create_gc_cycles: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 1.18 ms +- 0.00 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 1.20 ms +- 0.01 ms: 1.02x slower gc_traversal: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 3.16 ms +- 0.03 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 3.38 ms +- 0.15 ms: 1.07x slower generators: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 29.1 ms +- 0.3 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 29.7 ms +- 0.2 ms: 1.02x slower genshi_text: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 26.9 ms +- 0.2 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 27.6 ms +- 0.2 ms: 1.02x slower genshi_xml: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 60.8 ms +- 0.5 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 62.3 ms +- 0.4 ms: 1.02x slower go: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 149 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 154 ms +- 1 ms: 1.04x slower hexiom: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 6.79 ms +- 0.03 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 7.04 ms +- 0.05 ms: 1.04x slower html5lib: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 73.9 ms +- 0.7 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 76.8 ms +- 0.5 ms: 1.04x slower json_dumps: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 13.9 ms +- 0.2 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 15.6 ms +- 0.2 ms: 1.12x slower json_loads: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 31.5 us +- 0.3 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 35.6 us +- 0.6 us: 1.13x slower logging_format: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 7.57 us +- 0.07 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 8.25 us +- 0.14 us: 1.09x slower logging_silent: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 125 ns +- 1 ns -> [two_baselines_and_tldr/config_3/pyperf_output.json] 117 ns +- 2 ns: 1.07x faster logging_simple: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 6.89 us +- 0.05 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 7.22 us +- 0.09 us: 1.05x slower mako: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 12.8 ms +- 0.1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 13.3 ms +- 0.1 ms: 1.04x slower meteor_contest: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 102 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 109 ms +- 1 ms: 1.07x slower nbody: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 85.9 ms +- 1.0 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 89.2 ms +- 0.6 ms: 1.04x slower nqueens: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 105 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 113 ms +- 1 ms: 1.07x slower pathlib: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 22.2 ms +- 0.1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 22.9 ms +- 0.1 ms: 1.03x slower pickle: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 13.7 us +- 0.1 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 15.3 us +- 0.1 us: 1.11x slower pickle_dict: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 28.7 us +- 0.5 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 39.4 us +- 0.1 us: 1.37x slower pickle_list: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 4.42 us +- 0.08 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 4.23 us +- 0.13 us: 1.04x faster pickle_pure_python: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 357 us +- 3 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 370 us +- 2 us: 1.04x slower pidigits: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 171 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 170 ms +- 0 ms: 1.00x faster pprint_safe_repr: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 983 ms +- 10 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 1.01 sec +- 0.01 sec: 1.03x slower pprint_pformat: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 2.00 sec +- 0.02 sec -> [two_baselines_and_tldr/config_3/pyperf_output.json] 2.06 sec +- 0.02 sec: 1.03x slower pyflate: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 468 ms +- 5 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 481 ms +- 4 ms: 1.03x slower python_startup: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 10.3 ms +- 0.0 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 10.6 ms +- 0.0 ms: 1.03x slower python_startup_no_site: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 7.07 ms +- 0.02 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 7.31 ms +- 0.04 ms: 1.03x slower raytrace: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 304 ms +- 2 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 323 ms +- 2 ms: 1.07x slower regex_compile: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 155 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 158 ms +- 1 ms: 1.02x slower regex_dna: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 163 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 162 ms +- 1 ms: 1.01x faster regex_effbot: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 3.04 ms +- 0.08 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 3.11 ms +- 0.07 ms: 1.02x slower regex_v8: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 26.7 ms +- 0.3 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 25.4 ms +- 0.2 ms: 1.05x faster richards: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 55.7 ms +- 0.5 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 59.9 ms +- 0.9 ms: 1.07x slower richards_super: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 64.2 ms +- 0.7 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 68.0 ms +- 0.8 ms: 1.06x slower scimark_fft: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 395 ms +- 4 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 413 ms +- 3 ms: 1.05x slower scimark_lu: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 137 ms +- 2 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 142 ms +- 2 ms: 1.04x slower scimark_monte_carlo: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 76.0 ms +- 0.7 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 79.5 ms +- 1.1 ms: 1.05x slower scimark_sor: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 145 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 153 ms +- 1 ms: 1.05x slower scimark_sparse_mat_mult: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 5.76 ms +- 0.07 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 6.17 ms +- 0.11 ms: 1.07x slower spectral_norm: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 120 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 139 ms +- 1 ms: 1.16x slower sqlglot_normalize: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 145 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 151 ms +- 1 ms: 1.04x slower sqlglot_optimize: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 69.0 ms +- 0.4 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 71.0 ms +- 0.3 ms: 1.03x slower sqlglot_parse: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 1.44 ms +- 0.01 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 1.50 ms +- 0.01 ms: 1.04x slower sqlglot_transpile: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 1.76 ms +- 0.01 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 1.82 ms +- 0.01 ms: 1.04x slower sqlite_synth: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 3.45 us +- 0.04 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 3.57 us +- 0.05 us: 1.04x slower telco: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 10.9 ms +- 0.1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 11.6 ms +- 0.1 ms: 1.06x slower tomli_loads: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 2.48 sec +- 0.02 sec -> [two_baselines_and_tldr/config_3/pyperf_output.json] 2.63 sec +- 0.02 sec: 1.06x slower tornado_http: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 117 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 119 ms +- 2 ms: 1.02x slower typing_runtime_protocols: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 212 us +- 3 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 224 us +- 4 us: 1.05x slower unpack_sequence: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 38.4 ns +- 2.1 ns -> [two_baselines_and_tldr/config_3/pyperf_output.json] 41.2 ns +- 0.4 ns: 1.07x slower unpickle: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 18.6 us +- 0.2 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 20.3 us +- 0.3 us: 1.09x slower unpickle_list: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 5.32 us +- 0.15 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 5.93 us +- 0.15 us: 1.11x slower unpickle_pure_python: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 266 us +- 2 us -> [two_baselines_and_tldr/config_3/pyperf_output.json] 258 us +- 2 us: 1.03x faster xml_etree_parse: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 157 ms +- 2 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 164 ms +- 2 ms: 1.05x slower xml_etree_iterparse: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 106 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 113 ms +- 1 ms: 1.07x slower xml_etree_generate: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 116 ms +- 1 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 123 ms +- 2 ms: 1.06x slower xml_etree_process: Mean +- std dev: [two_baselines_and_tldr/config_1/pyperf_output.json] 78.4 ms +- 0.5 ms -> [two_baselines_and_tldr/config_3/pyperf_output.json] 83.2 ms +- 0.8 ms: 1.06x slower Benchmark hidden because not significant (6): async_tree_eager_io, async_tree_memoization_tg, async_tree_none_tg, asyncio_tcp, asyncio_websockets, mdp Geometric mean: 1.04x slower

I am putting together some analysis of how individual options affect performance that I will share but would like to get some optionions concerning which benchmarks can't afford to take a performance hit. For example I would be less concerned about startup and docutils benchmarks as regex and math benchmarks which could be used at high frequency in applications.

mdboom commented 2 months ago

I think, unfortunately, the answer to that is "it depends". Startup really matters for some applications, and not others, for example. Likewise, security really matters in some contexts, but not others. It's hard to speculate at the beginning of this project, but maybe the end result will be to make it easy to make a security-hardened build at the expense of performance when the end user wants to make that tradeoff.

I like the idea of breaking this out by individual flags, so we can see which have the most impact. It might also be possible that we can reduce the impact of some of the options by changing how some code is written in CPython, i.e. if an option makes some unsafe C feature slower, maybe we try to stop using that unsafe C feature if we can ;)

We have a whole set of standard benchmarking machines at Microsoft that are set up to get the results as-reproducible-as-possible. If you create a branch on your fork of CPython with some proposed changes, you can ping me and I can kick off a run, and the results will show up automatically on iscpythonfastyet.com. Unfortunately, we can't automate triggering those runs for security reasons, but it's really easy for me so don't hesitate to ask me and I can get to it pretty quickly during my working hours.

corona10 commented 2 months ago

@nohlson By the way, fail-through emits a lot of warnings from the CPython build (we pursue remove compiler warnings as possible), and some of them are intended fail-throughs. Do you have any plans for this?

example

./Modules/_testcapi/exceptions.c:38:9: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
        case 2:
        ^
./Modules/_testcapi/exceptions.c:38:9: note: insert '__attribute__((fallthrough));' to silence this warning
        case 2:
        ^
        __attribute__((fallthrough)); 
./Modules/_testcapi/exceptions.c:38:9: note: insert 'break;' to avoid fall-through
        case 2:
        ^
        break; 
./Modules/_testcapi/exceptions.c:42:9: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
        case 1:
        ^
./Modules/_testcapi/exceptions.c:42:9: note: insert '__attribute__((fallthrough));' to silence this warning
        case 1:
        ^
        __attribute__((fallthrough)); 
./Modules/_testcapi/exceptions.c:42:9: note: insert 'break;' to avoid fall-through
        case 1:
        ^
        break;

And for the expat module, you should send patches to the upstream. https://github.com/libexpat/libexpat cc @hartwork

nohlson commented 2 months ago

@corona10 Yes I am going to be implementing some tooling to keep track of new warnings that are generated by enabling these new flags, which is the next step of this process.

I had actually overlooked those warnings until I saw them in some of the buildbot compile logs. I had intended for the first round to be warning-free.

We can start by deciding if we should ignore the intended fall-through warnings in the tooling or add the attributes.

corona10 commented 2 months ago

We can start by deciding if we should ignore the intended fall-through warnings in the tooling or add the attributes.

Then, how about we revert the fall-through warning changes and then just test when your new tool is implemented? (I like your idea, but we need to separate tasks as possible) If not until then, emitted warnings(and as I said, some codes are vendored modules, not ours; we should submit the patch to the upstream first, which the fix can be delayed until they release new version) will be stressful for some core devs.

nohlson commented 2 months ago

@corona10 I agree let's remove fall-through warnings

and I will look into why I am not seeing those warnings when I build locally

nohlson commented 2 months ago

@corona10 PR to remove fallthrough warning option: https://github.com/python/cpython/pull/121041

Just from browsing the builds from #121030 it seems that clang is only emitting warnings for fallthrough.

Here are builds with warnings: https://buildbot.python.org/all/#/builders/721/builds/1465 (macos) https://buildbot.python.org/all/#/builders/111/builds/1387 (Fedora with clang)

And others don't (Fedora/RHEL w/ gcc): https://buildbot.python.org/all/#/builders/816/builds/977 https://buildbot.python.org/all/#/builders/745/builds/987 https://buildbot.python.org/all/#/builders/115/builds/1398

I will pay extra close attention to these compiler nuances when working on the tooling.

nohlson commented 2 months ago

@hugovk For the warning tooling I had initially considered if it would be feasible to add to the pipeline for each of the buildbots maybe even as a unit test the warning checks, but keeping track of the warnings for each platform/compiler might be too complicated. I was considering just making a couple github actions that run the warning check tooling for macos, ubuntu, and windows and that would be representative, just as was done for the docs warning tracker.

Are there any thoughts on the latter approach?

hugovk commented 2 months ago

Sounds like a good idea to just test on a subset. Using GitHub Actions means we can run as part of all PRs, and also people can test on their forks without too much bother.

Things to consider: do we want warnings to fail the build, so that people can't merge if they introduce new warnings? If not now, we can consider this for later.

The next level is allow warnings to report as a failure, but not let that failure block a merge. A downside of this is that it can still shows as red, even if they can merge, which people find annoying.

Then the next level is just to output the warnings in the log, and the job always reports as passed. The downside is you need to dig into the logs to see what happened, and most people won't do that.

If we're still at the investigation phase, we probably don't want to fail PRs just yet, but I imagine at some point we will.

Another thing to consider: will this be new jobs, or something added to the existing build?

For the docs warnings, we added it to an existing build: output all warnings to log, but otherwise build as usual, then in a later step of the same job, run a script to analyse the logs and decide when to fail.

nohlson commented 2 months ago

@hugovk Awesome thank you for the input! I am moving forward with a GitHub Actions solution.

Things to consider: do we want warnings to fail the build, so that people can't merge if they introduce new warnings? If not now, we can consider this for later.

The first iteration I will introduce will have options at the script level for failing on regression or improvement just as the docs version has but will have both disabled in the github action configuration. The results of the checks will just be printed out as a "warning" about warnings

The next level is allow warnings to report as a failure, but not let that failure block a merge. A downside of this is that it can still shows as red, even if they can merge, which people find annoying.

Once the tooling has been introduced we can create a new PR that enables an option that does produce a small number of new warnings. At this point we could enable fail on regression. Then we can decide if we are going to add to the ignore list or fix the manageable number of warnings.

Another thing to consider: will this be new jobs, or something added to the existing build?

Currently I am going to try to fit this into the existing ubuntu build job. Then I will take a look at the macos and windows jobs after. The design is very similar to the docs warnings checks.

hugovk commented 2 months ago

Sounds good 👍

hugovk commented 2 months ago

First PR for this: https://github.com/python/cpython/pull/121730 🚀

kulikjak commented 1 month ago

Hi, it seems that Solaris/Illumos linker is unhappy when you pass -fstack-protector-strong via into CFLAGS but not LDFLAGS, resulting in the following failure after #120975:

Undefined           first referenced
 symbol                 in file
__stack_chk_fail                    Programs/_freeze_module.o
__stack_chk_guard                   Programs/_freeze_module.o
ld: fatal: symbol referencing errors

I created #121979 to resolve this issue.

encukou commented 1 month ago

Adding the fortify source level 3 flag broke test_cext and test_cppext on several buildbots, e.g. here: https://buildbot.python.org/#/builders/64/builds/7269/steps/6/logs/stdio Please fix or revert.

Building wheels for collected packages: internal-test-limited-c11-cext
Building wheel for internal-test-limited-c11-cext (setup.py): started
Running command python setup.py bdist_wheel
CC env var: 'gcc -pthread'
CFLAGS env var: <missing>
extra_compile_args: ['-Werror', '-Wcast-qual', '-Werror=declaration-after-statement', '-DMODULE_NAME=_test_limited_c11_cext', '-std=c11', '-DPy_LIMITED_API=0x30e00a0']
running bdist_wheel
running build
running build_ext
building '_test_limited_c11_cext' extension
creating build
creating build/temp.linux-x86_64-cpython-314
gcc -pthread -fno-strict-overflow -fstack-protector-strong -Wtrampolines -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -Wsign-compare -DNDEBUG -g -O3 -Wall -fPIC -I/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto/build/build/test_python_2370939æ/tempcwd/env/include -I/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto/build/Include -I/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto/build -c extension.c -o build/temp.linux-x86_64-cpython-314/extension.o -Werror -Wcast-qual -Werror=declaration-after-statement -DMODULE_NAME=_test_limited_c11_cext -std=c11 -DPy_LIMITED_API=0x30e00a0
In file included from /usr/include/assert.h:35,
from /home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto/build/Include/Python.h:19,
from extension.c:7:
/usr/include/features.h:393:5: error: #warning _FORTIFY_SOURCE > 2 is treated like 2 on this platform [-Werror=cpp]
# warning _FORTIFY_SOURCE > 2 is treated like 2 on this platform
^~~~~~~
cc1: all warnings being treated as errors
error: command '/usr/bin/gcc' failed with exit code 1
error: subprocess-exited-with-error

BASECFLAGS are baked into sysconfig and reused by tools like setuptools. Consider adding extra flags to CFLAGS_NODIST, so they only affect CPython, not third-party extensions built for it. Those might have conflicting requirements.

nohlson commented 1 month ago

Consider adding extra flags to CFLAGS_NODIST, so they only affect CPython, not third-party extensions built for it. Those might have conflicting requirements.

I agree with limiting the scope of these options to just CPython. After testing with buildbots we can make this change as suggested by @corona10 here

encukou commented 1 month ago

I was redirected here: when adding configure options, I think it would be appropriate to have a wider discussion on Discourse rather than just on GitHub issues.

As I'm not following this effort very closely, I feel lost in the discussion and decisions :(

Practically, I think

the new configure options should be added to What's New
if some the options can conflict with other options set by users, there should be docs “somewhere” to guide someone from seeing a compiler/linker error to solving it.
the docs should link to the specific document that is being implemented, rather than the OpenSSF home page.

sethmlarson commented 1 month ago

Apologies @encukou, agreed that these should get discussed to make sure they're the best method of handling this. The new options got added as a way to opt-in to performance-affecting compilation options or opt-out of options that aren't supported on your platform. See: https://github.com/python/cpython/issues/121996

I'll work with @nohlson to create this topic and issues for documenting them properly.

nohlson commented 1 month ago

I started a thread to have more discussion related to this topic: https://discuss.python.org/t/new-default-compiler-options-for-safety/60057/2

Attn: @encukou @sethmlarson

python / cpython

Consider applying flags for warnings about potential security issues #112301

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Linked PRs

example