Optimization option recommendations should have caveats

ossf / wg-best-practices-os-developers

The Best Practices for OSS Developers working group is dedicated to raising awareness and education of secure code best practices for open source developers.

https://openssf.org

Apache License 2.0

769 stars 133 forks source link

Optimization option recommendations should have caveats #660

Open thesamesam opened 1 month ago

thesamesam commented 1 month ago

The compiler options guide currently mixes two things: 1) exploit mitigations and security improvements; 2) workarounds or opinionated takes on C or C++ behaviour (-fno-strict-aliasing, -fno-strict-overflow, -fno-delete-null-pointer-checks)

The issue with this is that the options that fall into case 2) aren't necessarily wins:

They're sometimes less tested, especially as a combination. Compiler bugs can happen for anything, but there have been e.g. -fno-strict-aliasing-only bugs.
They contribute to the idea that the rules are impossible to understand and hence there's no point in trying. If we're going to do this, there's various further options we should add.
The list is incomplete anyway for "no UB" (by making it all defined). If we're going to do that, the list needs to include far more options.
The documentation doesn't make clear that there's a difference which leads to people thinking they're directly hardening options.
Such options pessimise generated code without adding actual exploit mitigations and may even lead people to dismiss the other recommendations in the guide if they worsen performance.
They paper over other issues (for example, an issue leading to -fno-delete-null-pointer-checks doing anything may well crop up in other contexts, because the real reason that caused the compiler to believe the check could be removed should be fixed).
Such options change the semantics of the language and this should be emphasised to users -- it doesn't magically fix the problem, it defines it and users need to be aware of what those defined semantics are.
They're in an entirely different league to the mitigation options which various distributions enable by default for good reason, unlike these.

I'm not suggesting they be removed entirely from the guide, but I do think they belong in their own section and clearly marked as at least somewhat opinionated and not in the same class as the other options the guide recommends.

I would love to be able to recommend the Compiler Options Hardening Guide for C and C++ wholeheartedly, but I don't feel I can at the moment because of this.

https://gitlab.com/wireshark/wireshark/-/merge_requests/16860 is an example, but definitely not the only one, of the recommended/mentioned options being grabbed wholesale.

eli-schwartz commented 1 month ago

In particular, -fno-strict-aliasing should never ever ever be suggested anyway. The correct fix is always to fix your aliasing issues, which the compiler can already usually diagnose and you should recommend the relevant -Werror instead. If you do have strict-aliasing issues and don't fix them then UB will simply lead to random other bugs in random other parts of the optimizer since this option doesn't actually make it "well defined".

thesamesam commented 1 month ago

If you do have strict-aliasing issues and don't fix them then UB will simply lead to random other bugs in random other parts of the optimizer since this option doesn't actually make it "well defined".

To elaborate on this, while compilers who implement that option are surely saying they support a dialect where aliasing is allowed, it doesn't magically make such aliasing-violating-constructs okay. For example, we often find that aliasing violations go hand-in-hand with unaligned reads and writes. -fno-strict-aliasing doesn't make those legal and you will still get crashes.

david-a-wheeler commented 1 month ago

I disagree. In particular, I don't think they should be "moved" to a new section. The document is complicated enough already. If you want to add a special marking to some options, sure, though I think it may be harder to separate these cases than it might first appear.

They paper over other issues (for example, an issue leading to -fno-delete-null-pointer-checks doing anything may well crop up in other contexts, because the real reason that caused the compiler to believe the check could be removed should be fixed).

That's true for all unintentional vulnerabilities. Sure, it's better if the vulnerability was removed from the source code. However, since it's quite hard to be perfect, we enable various options to reduce security risks. Risk is a function of likelihood and impact - if we reduce the likelihood or impact of a vulnerability, then we've reduce its risk.

Let's focus on -fno-delete-null-pointer-checks to make this more concrete.

The reason -fno-delete-null-pointer-checks is included is because actual experience has shown that developers will occasionally make a mistake leading to a vulnerability, and that this specific option reduces the risks from such vulnerabilities. The document even cites a case in the Linux kernel, see: https://isc.sans.edu/diary/A+new+fascinating+Linux+kernel+vulnerability/6820

The Linux kernel developers are very concerned about performance and security. If they're willing to enable an option for security, then that is excellent evidence it's worth adding.

Is it better to write perfect code and never ever make a mistake, like, ever? Well, sure. Compiler options can't eliminate the value of good code. But that's not a real-world alternative. In practice, code has mistakes, so we need to enable options that reduce risks (likelihood or impact) from the incorrect code that WILL exist.

The documentation doesn't make clear that there's a difference which leads to people thinking they're directly hardening options.

The -fno-delete-null-pointer-checks option is a hardening option, full stop. It's no different than a canary option, because both are attempting to reduce the risk caused by the inevitable imperfections of real-world code.

The list is incomplete anyway for "no UB" (by making it all defined). If we're going to do that, the list needs to include far more options

A "no UB" option would probably have such a massive performance overhead that no one could practically enable it in production. I can't see how you'd do that without massive changes to the C or C++ languages. That said, if there were such an option, and it had no impact on performance, then I would strongly urge its use specifically as a hardening option. I think we'll need to wait for flying pigs first :-). Since that's not practically available, we need to identify what we can do as a practical matter to reduce risks. And that's okay. All life has risks, but we can typically take steps to manage those risks.

david-a-wheeler commented 1 month ago

This discussion reminds me of the 1965 book Unsafe at Any Speed. To oversimplify, car makers argued that drivers should simply never make mistakes. This was found to be absurd; drivers were unable to never make mistakes. The result: cars were redesigned to handle the real-world situation that mistakes are inevitable. Specifically they were designed to be less likely to kill people in the accidents that do happen, including seat belts, better cabin protection, etc.

Sure, it's much better to NOT be in an accident. But since the existence of accidents is 100% certain, we need to use cars designed to handle this unfortunate but expected reality.

Similarly, it's much better to not make a mistake in code. But since the existence of mistakes in code is 100% certain, we also need to select compiler options to handle this unfortunate but expected reality.

eli-schwartz commented 1 month ago

Let me just reiterate that it's actively dangerous and harmful to security to recommend -fno-strict-aliasing since it results in -Werror=strict-aliasing not providing diagnostics.

That means that users do not see fatal -Werrors for their bad code. So bugs that would otherwise be caught and fixed are not caught and not fixed and crash or do worse things.

If you would simply recommend the -Werror then it would fail to build and thus not be insecure.

thesamesam commented 1 month ago

Similarly, it's much better to not make a mistake in code. But since the existence of mistakes in code is 100% certain, we also need to select compiler options to handle this unfortunate but expected reality.

My argument in the original issue was to then go full hog and use various additional options to define a dialect of C, like -fno-tree-loop-distribute-patterns (which means e.g. handrolled memset stays as handrolled memset), -fno-finite-loops, -fno-lifetime-dse, -fno-aggressive-loop-optimizations at least.

We can also think about e.g. -fisolate-erroneous-paths-attribute, -fisolate-erroneous-paths-dereference.

There is nothing particularly special about -fno-delete-null-pointer-checks (which IMO just papers over a problem), but if you insist on it being a hardening option - which I disagree with - then the guide ought to be consistent in this regard for other UB.

To give another example: -funreachable-traps means that the compiler won't aggressively optimise based on a missing return or missing case statement, instead you get a trap. That would be worth considering with this model of C and C++.

A "no UB" option would probably have such a massive performance overhead that no one could practically enable it in production.

Or, and I'm sorry to make this argument, why even let people use -O2 at all? Why not recommend -O1? Or -O0?

I don't understand why this option in particular is special but the candidates I listed above, which do cause people to be confused when they write erroneous code (they often file GCC bugs over it), don't matter. Why is this particular pessimisation acceptable and special? Compiler optimisations often produce unexpected results? Anything interprocedural often leads to "spooky action at a distance" and I spend a fair amount of time helping people debug those -- should we recommend disabling anything IPA or LTO?

The Linux kernel developers are very concerned about performance and security. If they're willing to enable an option for security, then that is excellent evidence it's worth adding.

The kernel chooses to use this dialect and that's fine for them. It's a choice they've made. My point is that it's not as simple as an option to be slapped on and most people won't need it. The kernel also disables all vectorisation because it needs to e.g. save/restore SIMD registers (this works out well because it keeps code size down too), that doesn't mean all programs should do it.

The kernel using something also isn't a strong argument for general-purpose code because they're in a specific domain. The kernel people don't generally file missed optimisation bugs, for example, because they already disable vectorisation as I mention above.

The option was only added to Clang for the kernel to begin with in 2018. This doesn't imply it's particularly well-tested en-masse or that there's demand for it otherwise.

Is it better to write perfect code and never ever make a mistake, like, ever? Well, sure. Compiler options can't eliminate the value of good code. But that's not a real-world alternative. In practice, code has mistakes, so we need to enable options that reduce risks (likelihood or impact) from the incorrect code that WILL exist.

I didn't make the argument that UB is never problematic - I don't think it's fair to reduce it to that. I wouldn't have filed it if it was just about that because we've all been in those discussions before. I just think if the guide really wants to go this route, it should cover a bunch more, not single this out (with IMO insufficient background / explanation).

thesamesam commented 1 month ago

That means that users do not see fatal -Werrors for their bad code. So bugs that would otherwise be caught and fixed are not caught and not fixed and crash or do worse things.

Eli makes an interesting point here which ties in with the -fisolate-erroneous-paths-attribute and -fisolate-erroneous-paths-dereference options I mention. What's the line between diagnosing at compile-time (or possibly diagnosing at runtime) or papering over it? With -fno-delete-null-pointer-checks, for example, I don't think UBSAN can catch nullptr use in all cases, nor can I think -Wnull-dereference?

We're recommending use of -Werror for development, yet these options weaken warnings being emitted.

i.e. What's the line between making development catch problems vs. making sure you don't get weird runtime reports from customers when something is deployed?

thomasnyman commented 1 month ago

I'm sympathetic to there being two schools of thought on options such as -fno-delete-null-pointer-checks, -fno-strict-overflow, and particularly -fno-strict-aliasing and even agree with the title of this issue "optimization option recommendations should have caveats". I think we can do a better job in making the two schools of thought more evident, especially in the description of -fno-strict-aliasing.

The reality is that there is not a one-size-fits all solution, particularly since we've decided that the scope of the guide goes beyond application code and should be applicable to embedded development as well. We are also motivated by trying to improve the situation for existing code bases, which (unfortunately) are quite likely to use pointer casts for type punning. In the embedded space, standardized methods of type punning may not apply either, e.g., memcpy() in freestanding environments may not be available, or not optimized away when used for type punning.

That said, the argument that -fno-strict-aliasing is "actively dangerous and harmful to security" since interferes with -Werror=strict-aliasing not providing diagnostics misses the fact that we don't actually recommend these options for the same types of builds: we do recommend -Werror and a farily aggressive set of warning options for development, so developers can benefit from diagnostics such as Werror=strict-aliasing. We recommend -fno-delete-null-pointer-checks -fno-strict-overflow -fno-strict-aliasing for production builds so that code exposed to threats would avoid security-relevant UB which can be mitigated at a reasonably small cost to performance.

I think it would be great if we could expand the -fno-strict-aliasing description to include the caveats pointed out here and the other drawbacks of missing out on type-based alias analysis if there is someone who can elaborate on what kinds of optimizations that would preclude.

I agree with @david-a-wheeler that we should be conservative with spreading out the recommended options over several sections, but I'd be open to separating the options affecting language-level behavior to a separate table in Section 3, like. we now separate the warning options. I'd tend to agree that the current Table 2 caption does not really capture the nature of the -fno-delete-null-pointer-checks, -fno-strict-overflow, -fno-strict-aliasing options.

I'd also be more than happy to consider additional options in this category. If there are volunteers to contribute PRs in that regard please open separate own issues / PRs for additional option proposals to facilitate discussion on their individual merits.

I don't see not covering all options eliminating UB as a consistency problem though. We don't cover every hardening option available either, but we will consider any option that is brought to our attention. There were a lot of discussions in the Compiler Hardening BP calls over the the ones highlighted here in particular, but in the end we decided to include them since there were sufficient examples of real-world deployment to justify them.

@thesamesam You keep bringing up Wireshark deploying of the guide recommendations with the implication that it was done with little consideration. @oakimk and I are from the same organization and we had a very good discussion around the project's needs and the threats relevant to Wireshark prior to him proposing the MR question and came to an agreement on the flags to propose there. While I cannot speak for the other examples you allude to, I can assure you a lot consideration went in to the flags to enable in that particular case.

thesamesam commented 1 month ago

Just responding on the Wireshark bit first as I want to clear that up ASAP:

@thesamesam You keep bringing up Wireshark deploying of the guide recommendations with the implication that it was done with little consideration. @oakimk and I are from the same organization and we had a very good discussion around the project's needs and the threats relevant to Wireshark prior to him proposing the MR question and came to an agreement on the flags to propose there. While I cannot speak for the other examples you allude to, I can assure you a lot consideration went in to the flags to enable in that particular case.

Sure, I wasn't trying to suggest carelessness. I'm sorry for giving that impression as I didn't mean to. No disrespect or anything like that was intended! Apologies to you and @oakimk as I really was not trying to convey carelessness or anything of that sort.

I thought it was a direct port given that the commit message didn't communicate anything beyond following the guide. I in part reached that conclusion because I didn't see any discussion of binary size or performance being discussed in the PR either. I don't see it as a bad thing if people do that, given we want the guide to be easy to follow and obvious wins for most people. I didn't see any discussion over there about things being definitely included vs some for discussion etc, so I assumed that the whole thing was being used (which wasn't assuming bad faith, as I don't think this is unreasonable for people to be doing, even).

But my point is that these aren't obvious wins and the guide doesn't provide the nuance I'm describing above.

thesamesam commented 1 month ago

The reality is that there is not a one-size-fits all solution,

Yeah, there's a bunch of categories. Some stuff pretty much everybody should use as a bare minimum and it matches every major Linux distribution, and some stuff really does need careful evaluation with knowledge of the project. What I'm concerned about is people who are trying to do the right thing getting confused and thinking every flag mentioned should be adopted.

We should be more honest about this, though, and not include a TL;DR then. I don't object to documenting some of these flags we currently disagree on at all, but I think that section really gives the impression that one can slap that on and be done with it?

One reason I picked on Wireshark in particular was because I maintain it in Gentoo and I also know that it's a particularly large application which benefits a lot from more aggressive optimisation (both in binary size as well as startup time). But my perspective was maybe coloured a bit by my somewhat poor experience in CPython where I tried to give feedback on them adopting some of these flags (not the same ones we're discussing here) and it didn't really go anywhere.

Maybe to put it another way: I think "for production code" is a bit too definitive unlike the other rows in that table which are unobjectionable, and it gives the impression they're of the same tier. Also, typically, production mode means stuff with fewer checks and is faster, but this is the opposite (which might be desirable but I don't think it's clear). Especially given that below the table, we say:

In this guide, we use the term production code for executable code intended for use in the real world with real effects; it should be maximally reliable and performant.

should be applicable to embedded development as well.

You're absolutely right, of course, but then we run into other issues, like SSP not being always appropriate there either, or indeed fortification. I'm just not sure if it's going to scale to have everything in one document like this?

There's {embedded, regular} x {development, production} x {exploit mitigation, C nasal demons / footgun prevention}.

thomasnyman commented 1 month ago

We should be more honest about this, though, and not include a TL;DR then. I don't object to documenting some of these flags we currently disagree on at all, but I think that section really gives the impression that one can slap that on and be done with it?

I think this really gets to the heart of the matter. One of the big challenges we are trying to address with the guide is making the material easier to digest for a larger audience. The TL;DR and the summary of the situational options was an attempt at that, with the idea that it'll allow people to get started without reading the very long guide from start to finish before gaining any actionable advice.

I'm very open to alternative suggestions for presentation, but I think we need something like the TL;DR that acts as a way to get started with less time commitment.

I can completely see "for production code" being inadequate in communicating the caveats, so rewording that is maybe a good starting point in addressing the concerns within the current structure, along with improving the description of -fno-strict-aliasing to be more explicit about the tradeoffs. That description is either way not consistent with the way we've tried to cover the hardening options, with individual sub-sections for performance implications etc.

johnthacker commented 1 month ago

In the particular case of Wireshark and -fno-delete-null-pointer-checks, we had an issue filed because GCC, when using LTO and specific optimizations, would incorrectly delete a null pointer check: https://gitlab.com/wireshark/wireshark/-/issues/18216

It's all very well and good to say "it should be reported to the compiler maintainers and fixed in the compiler," but that's not really an option for production code that is built on a wide variety of platforms, compilers, compiler versions, optimization levels, etc. Especially given the number of people who compile adding their own optimization flags and then file issues about problems they see with it.

thomasnyman commented 1 month ago

We discussed this issue during the C/C++ Compiler BP Guide call 2023-10-17 as well and one of the conclusions there as well was that the evidence (pointed to in the guide) makes a good case for -fno-delete-null-pointer-checks in particular being valuable as a hardening flag.

We should make it clearer though that the intent with the -fno-strict-aliasing, -fno-strict-overflow, -fno-delete-null-pointer-checks is not to use them in a way that suppresses diagnostics, which seems to be an underlying concern in this issue. We will try to address that in the situational options table for now.

We will also revisit the -fno-strict-aliasing option. At least its description should be improved to be more descriptive of the tradeoffs pointed out here.