rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.95k stars 1.57k forks source link

Include constant time integer operation inside core #1814

Open valarauca opened 7 years ago

valarauca commented 7 years ago

Go-Lang has it's subtle to help developers write secure systems and servers by providing functions to do constant time integer operations.

Rust's goal is identical to this (writing secure systems and servers), so ideally shouldn't it provide a similar functionality?

These are simple re-definitions of primitive operations:

I am NOT proposing changes to std/core on how integer and/or slices work. Solid performance defaults are important.

I am proposing the inclusion of a few dozen free functions in core that are OPT IN ONLY. Where I'm not 100% sure. The swap/equality checking seems like they should be attached to their primitives. The slice comparisons, and cloning are weird.

Lastly the traditional method of doing this requires a degree of UB on how bit masking is done:

Returns X if Flag == 1

Returns Y if Flag == 0

Returns undefined values Flag is not 1 or 0

I've made a demo crate. But it is un-audited.

Edit 1:

I've implemented the boolean conversion and verified the ASM generated uses no branching or conditionals on x86/x86_64.

burdges commented 7 years ago

I'd applaud adding this to core. I donno if constant time operations get used by anything besides crypto though, so another idea might be starting a nursery crate for crypto. Initially, it could house crypto building blocks, like these constant time operations, and traits to provide a common interface for hashs, MACs, stream cyphers, and block cyphers, so as to simplify developing and auditing the actual algorithm implementations outside the nursery.

steveklabnik commented 7 years ago

Is this a duplicate of https://github.com/rust-lang/rfcs/issues/847?

valarauca commented 7 years ago

No this isn't a duplicate of https://github.com/rust-lang/rfcs/issues/847 . That is some moon shot level computer science.

Constant time algorithms go a bit above and beyond converting || to |. You effectively can't do any UTF-8 decoding since all the heavy branching. All == has to be converted to complex shifts/ands/xor operations. All branches have to be eliminated other then initial range/sanity checks. That last sentence alone will likely make the conversion undecidable.

I'm 50% sure there is a way to do what is being proposed. But pulling that off would warrant a Ph.D. in CS. I don't see us being able to integrate that proposal into the rust compiler before the decade is out.

Long term just exposing we know this works and is safe seems smarter then holding out for a year or two.

Edit 1:

I reviewed klutzy's attempt and there are some flaws in what is/isn't constant time. But some there isn't the branch elimination either. Also it is abandoned.

est31 commented 7 years ago

For real constant-time you probably also need hardware support, as things like the cache and others are not directly exposed to assembly programmers, let alone users of compiled languages (you can't say "i want these things to live on the cache and not RAM").

Plus, I'm not sure whether x86 or x86_64 define execution time of their instructions or whether its relies on the CPU and microcode version, etc.

clarfonthey commented 7 years ago

IMO, we should just add the option of annotating trait functions with const and then annotate all of the integer methods with that. That'd be a real fix.

IMO any quick-fix version like adding extra const functions shouldn't be stabilised for this reason.

valarauca commented 7 years ago

In order

For real constant-time

No just stop. Yes you are correct but you are not understanding how a constant time attack works. The network, caching, branch prediction effectively becomes noise you can average out over several thousand, million packets. Or fixed constants, that are subtracted out during analysis. You are looking for a delta time, between one packet and another. Above that looking for nano-second differences. So yes thousands to hundreds of thousands of samples are required to get that resolution.

The real goal is just to ensure the same number of instructions run for every branch of the operation. And that there is no variation in instruction execution time between the true/false branches. That requires none of your points.

I appreciate the hardware pedantry but it isn't applicable here.

:.:.:

IMO, we should just add the option of annotating trait functions with const and then annotate all of the integer methods with that. That'd be a real fix.

This isn't a fix. There is core logic that needs to be patched for this to work.

For example take this function

fn compare_array(x: &[u8], y: &[u8]) -> bool {
     assert!(x.len(),y.len(), "Arrays are non-equal. Memory error.");
     for i in 0..x.len() {
            if x[i] != y[i] {
               return false;
            }
      }
      true
}

Just modifying the integer operations will not make that constant time. What you are proposing is the exact same thing nadeko crate was/is proposing. This flatly doesn't work. You have to re-arrange the AST in ways that require understanding the indent of the author.

The point of including these functions is really to prevent this misunderstanding...

You cannot just op into constant time.

[I've submitted a bug report to nadeko](

:.:.:

There really isn't an extreme demand for this. Outside of writing crypto libraries, and comparing HMACS/Hashes/Keys. In all other fields you want maximum performance. Just with some cryptographic operations you want them to take the same time reguardless of branching.

If they do not, it becomes an attack vector.

The idea we need a #[constant_time] is frankly ridiculous. This isn't a cool toy. It won't increase performance. Using it everywhere won't make you more secure. It is like 5-6 functions that crypto-libraries need.

est31 commented 7 years ago

The network, caching, branch prediction effectively becomes noise you can average out over several thousand, million packets.

Not really. See this paper for info on how "constant time" OpenSSL was exploited through the cache: http://ts.data61.csiro.au/projects/TS/cachebleed//cachebleed.pdf

Quoting this site:

 For software to be constant time it should be written such that:

   * It only uses fixed-time instructions with arguments that depend on secret data
   * It does not use conditional branches that depend on secret data
   * It does not use memory access patterns that depend on secret data
valarauca commented 7 years ago

@est31

Once again technically correct...

But how can the Stdlib/Core dictate what developers do for memory management, cache prefetching, and cache coherency? That is not only out of the scope of this RFC, but out of the scope of the Rust programming language...

The thread's title is constant time integer operation. The points you outline have nothing to do with this.

andersk commented 7 years ago

The constant-time discipline can be enforced in the type system by taking advantage of lifetime parametricity:

use std::convert::From;
use std::marker::PhantomData;

#[derive(Clone, Copy, Default)]
pub struct CT<'p, T> {
    phantom: PhantomData<&'p ()>,
    data: T,
}

impl<'p, T> From<T> for CT<'p, T> {
    fn from(value: T) -> Self {
        CT {
            phantom: PhantomData,
            data: value,
        }
    }
}

pub fn run_ct<A, R, F>(f: F, a: A) -> R
    where for<'p> F: FnOnce(CT<'p, A>) -> CT<'p, R>
{
    f(CT::from(a)).data
}

We provide enough constant-time primitives like CT<'p, i32> ± CT<'p, i32> -> CT<'p, i32> and CT<'p, (A, B)> <-> (CT<'p, A>, CT<'p, B>) and so on that useful computations can be written; we don’t directly expose the private data field, to prevent f from unwrapping secret values from a CT and making data-dependent branches or fetches on them. The only way to unwrap a value from a CT is through run_ct, but because run_ct demands that f works for all lifetime parameters 'p and uses a unique lifetime parameter 'p for each call, it cannot be used to unwrap a CT from a different run_ct invocation. This guarantees that the running time of run_ct(f, a) won’t depend on the value of a.

(This is analogous to the way Haskell enforces safety of the ST monad.)

andersk commented 7 years ago

Note that we may need more than just integer primitives from the compiler. The (CT<'p, A>, CT<'p, B>) -> CT<'p, (A, B)> operation is not quite as trivial as it appears: it needs a barrier to prevent LLVM from mis-optimizing

fn ct_tuple<'p, A, B>(a: CT<'p, A>, b: CT<'p, B>) -> CT<'p, (A, B)>;

let (err_flag, result) = run_ct(|arg| {
    let err_flag = some_ct_computation;
    let result = another_ct_computation;
    ct_tuple(err_flag, result)
}, arg);

if err_flag {
    return Err("failed")
} else {
    …
}

by floating another_ct_computation into the else branch. Or perhaps the barrier belongs in run_ct itself.

burdges commented 7 years ago

I'd be disappointed if private fields and PhantomData could derail any optimizations, especially in LLVM where those notions should no longer exist. Just using whatever constant-time primitives you require suffices.

I doubt there is any value in "constant-time discipline" in the absence of an understanding of the reason why a particular piece of code should be constant-time. In fact, I'd wager folks would just recreate older much scarier side channel attacks by making a protocol fail to abort when a MAC or key validation failed.

I mean, you cannot really fix the lack of completeness of a NIST or brainpool curve, or secp256k1, with the silly band-aid of a messy implementation that treats the exceptional cases in constant time. You should simply purge those curves from your protocol and use a complete addition law.

andersk commented 7 years ago

I'd be disappointed if private fields and PhantomData could derail any optimizations,

@burdges Of course not; I fully expect that the mis-optimization I described is not affected by the private PhantomData, and would not be derailed by a naïve implementation of ct_tuple and run_ct, but it needs to be derailed because it could turn constant time code into non-constant time code, with different times for the error and success cases. If you imagine extending the example with multiple different error paths, that timing variance is exactly a kind of side channel that leads to vulnerabilities.

Just using whatever constant-time primitives you require suffices.

The point of my last example is to explain why integer primitives do not suffice and some kind of primitive with a barrier is necessary.

I doubt there is any value in "constant-time discipline" in the absence of an understanding of the reason why a particular piece of code should be constant-time.

I don’t understand. If your code satisfies a sound constant-time discipline, then that is a reason why it should be constant-time, and if it does not, how could you possibly have that understanding?

In fact, I'd wager folks would just recreate older much scarier side channel attacks by making a protocol fail to abort when a MAC or key validation failed.

I don’t see examples of bugs like that, nor how you think my proposal would make them more or less likely. (If the information that would be exposed following the skipped validation is timing-related, certainly less likely.) There has been a series of attacks enabled by timing differences between different failure paths, as well as attacks enabled by cache-based side channels, and those are exactly the kinds of vulnerabilities that my proposal is intended to exclude.

I mean, you cannot really fix the lack of completeness of a NIST or brainpool curve, or secp256k1, with the silly band-aid of a messy implementation that treats the exceptional cases in constant time. You should simply purge those curves from your protocol and use a complete addition law.

You can’t define away the real-world problem of sometimes needing to interoperate with existing protocols that are necessarily messy to implement securely. And even with protocols that carefully designed around this minefield, you still need guarantees that language optimizations won’t break your apparently secure implementation. Yes, that happens in practice, even for Curve25519.

andersk commented 7 years ago

floating another_ct_computation into the else branch

Okay on second thought, maybe that particular problem isn’t real. The branch already exposes its condition to at least a branch predictor side channel, and to avoid leaking the difference between multiple error flags, you need to combine them with a constant-time OR inside run_ct.

louy2 commented 7 years ago

https://www.chosenplaintext.ca/open-source/rust-timing-shield/

There is now a library that focuses on providing restricted integer types.