rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.34k stars 12.72k forks source link

Consider disabling LVI mitigations in the x86_64-fortanix-unknown-sgx target #108414

Open clauverjat opened 1 year ago

clauverjat commented 1 year ago

The x86_64-fortanix-unknown-sgx target currently has Load Value Injection (LVI) mitigations enabled by default. These mitigations were introduced by #72655. At the time, these software mitigations were necessary on all Intel hardware to protect against the LVI vulnerability (CVE-2020-0551). Recent CPUs (such as Icelake and newer) now have hardware-level protections against LVI ^1, which makes the software-based mitigations unnecessary.

Since LVI software mitigations can result in significant performance overheads (2x to 19x slowdown factor)^2, it would be preferable to make these mitigations opt-in rather than enabled by default.

That being said the situation regarding speculative execution vulnerabilities is not simple. The LVI mitigations that we have enabled today might give us protection against other speculative attacks. So before removing the LVI mitigations, it's important to investigate whether we need software-based mitigations against other speculative execution attacks. For example, I saw that recent CPUs are vulnerable to Branch History Injection (BHI) and Intra-Mode BTI vulnerabilities^3. Those vulnerabilities are of the Spectre variant 2 kind. Those are usually mitigated with retpolines on x86 platforms, so we should consider using target-feature=+retpoline. Intel's technical documentation provides some recommendations^4. It's worth noting that retpolines are much lighter than LVI mitigations, typically resulting only in a 5-10% overhead^5.

Thus, if we just need this mitigation on modern CPUs, this will already be a big performance win.

@jethrogb @raoulstrackx : what do you think ?

raoulstrackx commented 1 year ago

Machines with LVI mitigations are still relatively new. There may be many users who are still on platforms that are vulnerable to LVI. They are unlikely to notice that a new compiler changed the default.

In addition an attacker may choose to run such enclaves without software LVI mitigations on a platform that is still vulnerable and attack it. (For cloud providers this would be almost trivial to execute.) Intel still supports these vulnerable platforms and one can create valid attestations for them. During this attestation process Intel does signal that LVI needs to be mitigated in software, but the users need to be able to handle that correctly. This is not trivial. Currently they can assume all required software mitigations are in place.

I don't have a good background on these target features, but can we keep the default to LVI being mitigated, while enabling users to turn these mitigation off when they have alternative solutions in place to detect that the enclave isn't running on a vulnerable platform?

clauverjat commented 1 year ago

Machines with LVI mitigations are still relatively new. There may be many users who are still on platforms that are vulnerable to LVI. They are unlikely to notice that a new compiler changed the default.

You're right this is the main issue with disabling a software mitigation. In the current situation, you can add a software mitigation, but it is almost impossible to remove one without effectively deprecating the platforms concerned by the mitigation... We should try to see how we can prevent this situation from occuring in the future.

In addition an attacker may choose to run such enclaves without software LVI mitigations on a platform that is still vulnerable and attack it. (For cloud providers this would be almost trivial to execute.) Intel still supports these vulnerable platforms and one can create valid attestations for them. During this attestation process Intel does signal that LVI needs to be mitigated in software, but the users need to be able to handle that correctly. This is not trivial. Currently they can assume all required software mitigations are in place.

It's clear that from a security standpoint, the current situation is convenient for developers, who can assume that all necessary software mitigations are enabled. However, users of non-LVI vulnerable platforms are paying a huge performance penalty for it. Until recently, the DCAP attestation collateral didn't include the list of Intel Security Advisories (Intel SAs) that applied to the platform (although the EPID Attestation Service always did). With DCAP collateral v4, the list of Intel SA IDs is now provided in the TCB Info part of the collateral, making it a lot easier to correctly handle those checks. For an example of the new collateral with the list of Intel SA IDs, you can have a look at ^1.

But getting the Intel SAs is only half of the battle. The other part is knowing which Intel SAs are mitigated by the toolchain you're using. To help with this, I suggest the following:

  1. The SGX target should clearly document which Intel SAs are mitigated.
  2. Make the list of mitigated Intel SAs available via a function or constant to be added in the std::os::fortanix_sgx module. This way, developers can have a test (run with cargo test) to ensure that what the SGX target mitigates against and what they expect it to mitigate match. This would help avoid the issue we have now, where removing a mitigation might silently break the security of the system.

Note: There is an obvious caveat to suggestion 2. that I need to mention. Developers could be tempted to misuse such a function (to be honest, this is the case with many things regarding SGX). The attestation should not rely on what the enclave claims to be protected against. If the enclave is vulnerable on the platform, it might be possible to tamper with the list of Intel SAs being communicated (even when embedded in the report). So if implemented, the documentation of the function should make it clear that it is only to be used as part of system testing and that it must not be used in a production enclave.

I don't have a good background on these target features, but can we keep the default to LVI being mitigated, while enabling users to turn these mitigation off when they have alternative solutions in place to detect that the enclave isn't running on a vulnerable platform?

For the time being, I guess it's best to stick with the LVI mitigation enabled by default. But it would be great to have the option to turn off LVI mitigation. The problem is that, like you, I'm not sure how to do that. Is it possible to create "variants" of the target toolchain, and if so, how would we do that ?

P.S.: This might look silly but if there is no better solution maybe we could create a new target (like x86_64-fortanix2-unknown-sgx) for modern CPUs. Obviously this is far from ideal... So if anyone has a better idea, I would very much appreciate it.

workingjubilee commented 1 year ago

If the enclave can be vulnerable in a way undetectable from userspace that affects the result of the attestation, how can we possibly allow people to examine the results of a meaningless function and write conditional logic on that? They absolutely will use it in production.

Is there no way to perform a CPUID-like query regarding the host's model without that possibly having been intercepted and altered by the host?

Note that if this was provided via a target feature that was exposed to Rust users, it would be easy to simply disable during compilation, and that altering such is implicitly unsafe. However I have no idea if the attributes in question can be meaningfully applied on less than a binary-wide basis, which has implications regarding the rest of the semantics of a target_feature.

clauverjat commented 1 year ago

Hello @workingjubilee

Thanks for taking an interest in the matter. First, I would like to insist that until you've checked the attestation properly (which is not a simple task), you should not trust anything that come from something claiming to be an "enclave". For all you know, the code could run completely outside SGX, run in an emulator, run in a hardware DEBUG mode, or simply run a malicious code in a perfectly secure SGX environment...

Is there no way to perform a CPUID-like query regarding the host's model without that possibly having been intercepted and altered by the host?

In general there is no way for an SGX enclave to get trustworthy CPUID information about the CPU. You simply cannot run the CPUID instruction in SGX mode. Depending on what you want to do, there are ways around it. For instance to check if an extension like AVX is available you can try to execute it, if it runs successfully you should be able to assume AVX is available and if you get an invalid opcode exception then the instruction was not available. This is very hacky (you have to assume that Intel CPU instruction decoder properly raise #UD exception when encountering a unknown opcode...) but this is to my knowledge the best we can do, and it is good enough in practice.

But this problem is not related to what I was proposing. I was merely proposing to add a (compile-time) constant with the list of mitigated Intel SA in the core/std library. This would be embedded at compile-time to make it easier for developers to know what the sgx target is protecting them against. But to be clear, there is no runtime detection in any of this.

If the enclave can be vulnerable in a way undetectable from userspace that affects the result of the attestation, how can we possibly allow people to examine the results of a meaningless function and write conditional logic on that? They absolutely will use it in production.

If an enclave runs on a hardware where SGX is vulnerable, you can no longer assume any security guarantees normally provided by SGX (integrity of execution and confidentiality) anyway. So what I suggested does not make the enclave more vulnerable than usual. As mentioned by @raoulstrackx the attestation provided by Intel mentions the vulnerabilities affecting the platform on which the enclave is running. The possible misuse scenario I was referring to would be due to a broken attestation flow like this one :

  1. The client ask the enclave for attestation evidence
  2. The enclave returns the attestation evidence and the list of Intel SA that are mitigated
  3. The client checks the attestation evidence and accept it based on the wrong assumption that enclave provided Intel SA are mitigated. But doing so, the client relies on a circular argument. Since the enclave is not yet attested, the client can't assume the received list of Intel SAs is trustworthy. In a correct attestation flow, the client should verify the attestation against a policy/manifest which list the Intel SAs that can be accepted (because they have been mitigated by the toolchain when building the enclave). This policy should be made by the enclave developer during enclave development and distributed to the clients in a trusted manner.

To avoid further confusion, when I was talking about this introspection capability. I was thinking about something like that (the snippets below are only a preliminary draft in pseudo code): We could add a constant in std::os::fortanix_sgx::arch such as :

const MITIGATED_ADVISORIES :  &[&str] = &["INTEL-SA-00079","INTEL-SA-00076"];

So that enclave developer could have a test like :

#[test]
fn test_mitigation(){
    use std::os::fortanix_sgx::arch::MITIGATED_ADVISORIES;
    assert!(MITIGATED_ADVISORIES.contains(&"INTEL-SA-00079"));
    assert!(MITIGATED_ADVISORIES.contains(&"INTEL-SA-00076"));
}

Upon further reflection there is another way to achieve a similar goal which might be better : provide a macro instead. This macro would be a compile-time assertion of what the rust toolchain must mitigate. A typical usage would look like :

// enclave developer main.rs
mitigated_advisories_must_include!(["INTEL-SA-00079","INTEL-SA-00076"]);

The macro would check that the corresponding Intel SA are mitigated by the toolchain, and if not it would emit a compile time error. It would be even better from a security perspective since the code would not even compile without the expected mitigation. Of course to implement such macro you might want to provide a compile-time available list of Intel SA... which if exposed publicly might have the same misuse issue as the other approach.

With this approach an auditor would just have to check that the Intel SA allowed by the attestation validation policy matches with the declared Intel SA in the mitigated_advisories_must_include! macro in the enclave source code.

Note that this introspection feature would not only allow us to safely remove mitigations (in the future, it's too late for LVI-mitigation), but it would also prevent enclave code from being accidentally compiled with an old toolchain missing a required mitigation.

clauverjat commented 1 year ago

Note that if this was provided via a target feature that was exposed to Rust users, it would be easy to simply disable during compilation, and that altering such is implicitly unsafe.

Regarding the target_feature aspect. I think we might be mixing two things. So I'll try to separate the two.

When I was talking about the target-feature=+retpoline I was only saying that Rust support the retpoline mitigation (see https://github.com/rust-lang/rust/issues/54637) via this target-feature. And that one might enable it by putting it in the feature field of the target defined in https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/x86_64_fortanix_unknown_sgx.rs#L66

Independently of how retpoline could be applied, there is the question of how we could enable developers to configure the mitigations they want to enable (in the present case : disabling LVI mitigations, and enabling retpoline ?).

However I have no idea if the attributes in question can be meaningfully applied on less than a binary-wide basis, which has implications regarding the rest of the semantics of a target_feature.

I don't really know what can and cannot be done with a target_feature. But regarding the mitigations they must be enabled on the whole binary (which also includes the sysroot with the core + std library). And we need to be careful with that, because if only part of the code is mitigated, it might work just fine on a functional level (I don't think we'll have ABI compatiblity issue in this case), but it would most likely render the mitigation useless, therefore giving a false sense of security...

About this, I have a question regarding how cargo/rustc handles RUSTFLAGS. When a developer sets the RUSTFLAGS environment variable, and compiles its rust project with cargo, cargo will compile his crate and dependencies with the given flags. But what about the linked std library and core library ? Will they be recompiled with the given rustflags ? My guess would be no since I think the sysroot is downloaded pre-compiled via rustup, but I am really not sure. If cargo doesn't rebuild the std library with the provided RUSTFLAGS, I don't see how developers could opt-out or opt-in a mitigation... This would force us to create a new target to have different mitigation settings. I hope someone with more experience with the rust compiler can shed a light on that !

Thanks

workingjubilee commented 1 year ago

In general there is no way for an SGX enclave to get trustworthy CPUID information about the CPU. You simply cannot run the CPUID instruction in SGX mode.

Oh yeah, I expected that wouldn't work.

Yeah, "you can't actually trust anything entirely" seems like... a pretty severe problem of epistemology here.

I don't really know what can and cannot be done with a target_feature. But regarding the mitigations they must be enabled on the whole binary (which also includes the sysroot with the core + std library).

Then describing them as target features is likely inappropriate, because target features can usually be toggled on a function-by-function basis. From Rust's perspective, a "target feature" is a directive to compile a function differently, e.g. "compile this function while using AVX instructions". You can give the directive "compile all functions this way" and that's common but individual functions can still have that toggled on and off using stuff like #[target_feature(enable = "+feature")].

Regarding this, I have a question regarding how cargo/rustc handles RUSTFLAGS. When a developer sets the RUSTFLAGS environment variable, and compiles its rust project with cargo, cargo will compile his crate and dependencies with the given flags. But what about the linked std library and core library? Will they be recompiled with the given rustflags

Nope! Definitely not. Code that isn't precompiled into std will still be generated with the enabled RUSTFLAGS appropriately, but that only applies to a certain subset of code that uses generics and the like which are monomorphized at compile time.

You would have to use -Zbuild-std to get around this. Or yes, a new target.

clauverjat commented 1 year ago

Yeah, "you can't actually trust anything entirely" seems like... a pretty severe problem of epistemology here.

Well it takes some time to get used to the security model of SGX, but this is also what makes the technology so useful 😉

Then describing them as target features is likely inappropriate, because target features can usually be toggled on a function-by-function basis.

I see. It makes me wonder about the retpoline target feature... Looks like target_features are sometimes abused because we lack better alternatives.

Code that isn't precompiled into std will still be generated with the enabled RUSTFLAGS appropriately, but that only applies to a certain subset of code that uses generics and the like which are monomorphized at compile time. You would have to use -Zbuild-std to get around this. Or yes, a new target.

That's what I feared, but at least we are making progress. We now have a choice to make.

@raoulstrackx : Now that the situation is clearer, can we make a plan on how to move forward ?

jethrogb commented 1 year ago

I think we should have safe defaults, but we should also allow people who know what they are doing (in this case: properly checking advisories in TCB levels against known compile-time settings) to get the best performance.

I think the Rust community won't appreciate a proliferation of compile targets, so until -Zbuild-std is stabilized, I think the best we can do is have all the mitigations on for std and customizable for the rest. I think for LVI target features are fine because they are actually implemented in LLVM als target features. What I don't know is if it's possible to turn off a feature that's enabled directly by the target: https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/x86_64_fortanix_unknown_sgx.rs#L66

workingjubilee commented 1 year ago

It is, yes.

It usually causes all sorts of unanticipated behavior because it can incur differing calling conventions from expectations, but it is.

clauverjat commented 1 year ago

I think the Rust community won't appreciate a proliferation of compile targets, so until -Zbuild-std is stabilized, I think the best we can do is have all the mitigations on for std and customizable for the rest. I think for LVI target features are fine because they are actually implemented in LLVM als target features. What I don't know is if it's possible to turn off a feature that's enabled directly by the target: https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/x86_64_fortanix_unknown_sgx.rs#L66

I had a look at https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/x86_64_fortanix_unknown_sgx.rs#L66-L67

The features we would like to disable are :

I also had a look at https://rust-lang.github.io/rfcs/2045-target-feature.html With those information I was able to figure out how to turn off the LVI mitigations. To do that I added these lines to my .cargo/config :

[build]
[target.x86_64-fortanix-unknown-sgx]
rustflags = ["-Ctarget-feature=-lvi-cfi,-lvi-load-hardening"]

The -(minus sign) removes the features from the default feature of the platform for the whole crate (but not the sysroot) as explained in the target feature RFC. I confirmed that the features were disabled by compiling a simple function with and without those lines and analyzing the resulting assembly. The LFENCE around load and ret were removed from the assembly with the lvi-disabling rustflags, and the resulting assembly was almost identical to the assembly generated for the x86_64-unknown-linux-gnu target (as I expected).

But what about the llvm-arg --x86-experimental-lvi-inline-asm-hardening ?

I tried to disable it similarly by adding a "-Cllvm-args=-x86-experimental-lvi-inline-asm-hardening=false" to the rustflags list. But this was without effect (I guess the defaults of the target override my custom llvm arg). Yet it seems that we don't need to disable the x86 experimental lvi inline asm hardening LLVM pass, I saw experimentally that this pass does not add LFENCE to rust inline assembly when the lvi-cfi and lvi-load-hardening features were disabled anyway. Digging a little deeper I had a look at the LLVM code and was able to find where the LVI ASM hardening flag was used. It is used in the X86AsmParser https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp#L4045-L4056

As you can see in the code, for assembly LVI mitigations to be enabled you need both LVIInlineAsmHardening and FeatureLVILoadHardening/FeatureLVIControlFlowIntegrity.

So in the end even if we can't remove the x86-experimental-lvi-inline-asm-hardening LLVM arg, we are still able to remove the LVI mitigation, which is all that matters.

clauverjat commented 1 year ago

Still there is one question that remains unanswered : do we need to enable other Spectre mitigation to replace LVI in recent CPUs?

raoulstrackx commented 1 year ago

I went quickly over the list for Sapphire Rapids and I think you'd need mitigations for: