Use getrandom crate for retrieving system entropy?

newpavlov commented 5 years ago

Right now std and getrandom essentially duplicate each other. And Rust already depends on rand (via tempfile) which in v0.7 will use getrandom. So I think it makes sense to use a single implementation, to keep work on correctly retrieving system entropy focused in one place.

Note that right now I do not propose to expose getrandom API as part of std or introduce a lang item, it's a separate discussion, see rust-random/getrandom#21. Also it's probably will be better to wait until rust-random/getrandom#13 will be closed, which should happen relatively soon.

cc @dhardy @tarcieri

ExpHP commented 5 years ago

Are there existing instances of std depending on another crate for cross-OS functionality?

In particular what I'm wondering is: When a new platform is added, do we have a strategy for how to update both std and the getrandom crate that doesn't break CI testing for some period of time?

newpavlov commented 5 years ago

For unsupported platforms getrandom crate uses a dummy implementation which always returns Error::UNAVAILABLE. And the linked PR uses zeros in case of getrandom failure, thus new targets will be "supported" automatically.

briansmith commented 5 years ago

I think this should only be done if getrandom is moved into the rust-lang org. But, this proposal is kind of the opposite of that: taking rust-lang-managed code and replacing it with externally-managed code.

What concrete problem is this actually solving?

And Rust already depends on rand (via tempfile) which in v0.7 will use getrandom

std's dependency on the external tempfile crate seems to be a core problem itself.

Note that even earlier this month there were serious bugs found in the getrandom crate's implementation.

tarcieri commented 5 years ago

I'd personally like to see some sort of first class API in Rust which solves this particular problem. That said I think getrandom contains a lot of good work which is at the very least a stepping stone towards getting us there.

I'm not sure the proper interface boundary to make this work "first class": a lang-item?

To answer @briansmith's concerns: should rust-random be made more official? Is rust-lang-nursery more official? Is anything "official" besides the code that resides in https://github.com/rust-lang/rust ?

As I look at recent rust-internals threads and hotbutton items that perpetually come up, I definitely wonder: why isn't secure randomness in core?

I think it probably should be.

The overwhelming majority of CPUs and microcontrollers are dedicating silicon to this problem. Based on that, and the amount of time I spend pontificating on this particular problem with both hardware designers and cryptographers, I think the problem is a core concern.

newpavlov commented 5 years ago

@briansmith We have no objections against transferring getrandom to rust-lang org and of course will be happy to receive extensive reviews from community.

What concrete problem is this actually solving?

Having a single code base for securely retrieving system entropy, not 3 or more. Considering non-triviality of this problem on some targets, I believe it's a problem worth soving.

tkaitchuck commented 5 years ago

One of the reasons cited for why this is needed is because HashMap depends on random.

In the course of implementing aHash I found a much faster way to do this.

For it's randomization aHash uses a combination on Const-Random and the address of the HashBuilder. This essentially eliminates any runtime overhead and makes creating small hashmaps on the stack much cheaper.

I am hoping aHash becomes the default hasher, but even if it doesn't this approach is worth considering.

dhardy commented 5 years ago

So you combine a high-quality per-binary key, a low-quality (limited address space) per-run key with a hash function optimised for small input who's security relies on no-one seeing the output? Certainly intriguing.

I guess the important question is whether this is good enough for everyone's hashing needs (because if not, it doesn't remove the need for a system RNG in std). This approach does have more potential attack vectors (e.g. leaking hashmap keys in an application might be enough to launch a DOS attack on that application, possibly even on a new run from the same binary).

KellerFuchs commented 5 years ago

@tkaitchuck That approach seems thoroughly unsuitable as a default, as its security relies on both:

a high-entropy value generated at compile-time (by const-random), which:
- makes reproducible builds essentially-impossible without a bunch of additional work;
- makes the distribution of prebuilt binaries insecure;
hash values being kept secret:
- highly-dubious in a hashtbl implementation, as information on the hash values is leaked through timing and cache-based side-channels;
- doesn't hold for users of std::hash::{Hash, Hasher} in general, i.e. would silently break current users of those APIs.

The current, default Hasher is AFAIK SipHash-1-3, which is designed to be a short-input PRF; in particular, that removes all issues related to (2): the knowledge of some hash values doesn't provide a meaningful advantage for guessing (properties of) other hash values.

Moreover, cryptography is littered with broken systems that were designed under assumptions that weren't precisely the ones that are guaranteed by the actual primitives used; having the default hash implement a random oracle (i.e. a PRF) is the strongest requirement which we can make of it, it matches the intuition of what a hash function is, and many other desirable properties (MAC, ...) follow from it. Switching to a function that only upholds weaker properties means that we are moving cognitive and cryptographic burden onto the users of this API, which is very undesirable as it's a very-common API meant for use by virtually-all Rust developpers (as opposed to a specialist API that assumes in-depth knowledge of cryptography).

On the topic of aHash itself, it doesn't seem to make any well-defined security claims, let alone claims w.r.t. the standard models used in cryptography (being a PRF, a XOF, a MAC, ...); as such, it's unclear to me how anyone is supposed to use it appropriately (i.e. only relying on properties it claims to provide).

tarcieri commented 5 years ago

For it's randomization aHash uses a combination on Const-Random and the address of the HashBuilder.

As @KellerFuchs noted this breaks reproducible builds, which have been a major goal of the Rust ecosystem and an area of concern for the Rust Secure Code WG. I personally consider anything which breaks reproducible builds to be highly undesirable.

I am hoping aHash becomes the default hasher

I strongly hope anything selected as the default hasher has undergone a considerable amount of peer review, particularly by those involved in previous hashDoS attacks (e.g. Jean-Phillippe Aumasson, Martin Boßlet, djb). I do not think aHash meets that bar.

What aHash does seems not too far off from Golang's aeshash, which was made the default hasher on x86. aeshash had numerous problems including but not limited to key recovery attacks, and I think serves as a great example of why hash functions like this need considerable scrutiny before being made the default.

tkaitchuck commented 5 years ago

@KellerFuchs

The current, default Hasher is AFAIK SipHash-1-3, which is designed to be a short-input PRF; in particular, that removes all issues related to (2): the knowledge of some hash values doesn't provide a meaningful advantage for guessing (properties of) other hash values.

That simply is not true. If an attacker can induce values to be hashed and see the resulting hashes. They can simply brute force partial collisions. No hashing algorithm can protect against this.

KellerFuchs commented 5 years ago

The current, default Hasher is AFAIK SipHash-1-3, which is designed to be a short-input PRF; in particular, that removes all issues related to (2): the knowledge of some hash values doesn't provide a meaningful advantage for guessing (properties of) other hash values.

That simply is not true. [...] They can simply brute force partial collisions.

@tkaitchuck Yes, that's what “doesn't provide a meaningful advantage” means: the adversary cannot distinguish it (up to some negligible advantage, here it's IIRC 2⁻⁶⁴) from a random oracle, and so cannot do better than a generic brute-force attack (i.e. they don't gain a meaningful advantage from us using the concrete SipHash function, compared to the idealised model).

That's the essence of what security claims in cryptography are, and you might want to familiarise yourself with that field before pushing for the inclusion of your ego-project in the stdlib?

tkaitchuck commented 5 years ago

@kellerFuchs I am quite familiar with security claims. There is more going on here. It is useless to prove an adversary can not distinguish it from random, if you reveal to them the output. Even the most secure hash function with a perfectly random seed will leave you open to DoS.

Consider a (bad) hashTable that decides to rehash after N consecutive collisions. If a attacker can just observe the iteration order of the map they can very rapidly exhaust the systems memory. Because if the lower bits of group of keys is the same they will all go into the same hash bucket. So brute forcing N bits can force the allocation of 2^n ram. If instead collisions are linked, a cpu use attack becomes trivial.

Instead a HashMap must inline the collisions into the array, and should keep additional bits (as HashBrown does) and change the hash function upon resize.

Attempting to use a "secure" hash function with a bad collision mitigation mechanism in the HashMap is pure security theater.

Once all of those requirements on the HashMap are in place, the requirements on the hash function are actually weak but very specific: It must meet the strict avalanche criterion and bit Independence criterion for all the bits of the input and of the key.

That's it. It doesn't have to be a strong PRF, be non-reversable, or be indistinguishable from random output. Asserting that these properties hold is in fact counter productive because it leads people to the incorrect belief that these are relevant or sufficient, when they are not. Which in turn leads to the very common mistake of installing a 'strong' hash function and assuming the problem is solved, leaving the system wide open to attack.

tarcieri commented 5 years ago

@tkaitchuck this will be my last response on this issue as I feel like your hash function is a distraction from the real issue at hand, and for that reason I'll keep it brief.

The core idea of hashDoS is applying traditional cryptanalysis techniques to non-cryptographic hash functions, and then leveraging those attacks to generate multicollisions. If we look at how the attacks on MurmurHash2 and CityHash (i.e. hashDoS Reloaded as presented at 29c3) actually worked, they were both based on differential cryptanalysis (see Martin Bosslet's excellent writeup on these attacks).

Reduced round AES is known to be vulnerable to differential cryptanalysis, including but not limited to practical impossible differential cryptanalysis up to 5 rounds, as well as practical key recovery attacks up to 5 rounds.

Constructions which purport to address hashDoS require rigorous security requirements and analysis, ideally published in the form of a peer reviewed research paper. Your construction does not meet these requirements, and does quite the opposite in terms of using AES in ways which, for me at least, sets off alarm bells.

tkaitchuck commented 5 years ago

Yes. This has gotten off topic. Please post any comments in terms of what you would like to see here: https://github.com/tkaitchuck/aHash/issues/10

KellerFuchs commented 5 years ago

Even the most secure hash function with a perfectly random seed will leave you open to DoS. [...] Instead a HashMap must inline the collisions into the array, and should keep additional bits (as HashBrown does) and change the hash function upon resize. [...] Attempting to use a "secure" hash function with a bad collision mitigation mechanism in the HashMap is pure security theater.

Yes, when rehashing a table, one should select a new random key (hence the term rehashing...). In general, it is off course entirely possibly to use a perfectly-fine hash function insecurely (as your example shows).

Nobody in this discussion claimed the opposite, or that std::hash::Map should use a bad collision mitigation mechanism, so I'm not sure who or what you believe to be answering.

Once all of those requirements on the HashMap are in place, the requirements on the hash function are actually weak but very specific: It must meet the strict avalanche criterion and bit Independence criterion for all the bits of the input and of the key

This is incorrect for several reasons:

The avalanche and independence criteria do not state anything against adversarially-chosen values (such as in a DoS attack); in fact, CityHash and MurmurHash did satisfy those criteria, yet Aumasson, Bernstein & Boßlet broke both, with an attack that yields universal multicollisions. Those statistical criteria are simply not appropriate in adversarial contexts.
As stated earlier, HashMap isn't the only user of std::hash (a public API).

It doesn't have to be a strong PRF, be non-reversable, or be indistinguishable from random output. Asserting that these properties hold is in fact counter productive because it leads people to the incorrect belief that these are relevant or sufficient, when they are not. Which in turn leads to the very common mistake of installing a 'strong' hash function and assuming the problem is solved.

I'm not sure why you are asserting that weakening the properties provided by a public API is good ergonomics and will lead to less insecure code being written; I'm not usually in the habit of repeating myself, but here you go:

cryptography is littered with broken systems that were designed under assumptions that weren't precisely the ones that are guaranteed by the actual primitives used; having the default hash implement a random oracle (i.e. a PRF) is the strongest requirement which we can make of it, it matches the intuition of what a hash function is, and many other desirable properties (MAC, ...) follow from it. Switching to a function that only upholds weaker properties means that we are moving cognitive and cryptographic burden onto the users of this API

In any case, this isn't something I will debate: providing misuse-resistant constructions are the best tool we have to help people avoid using them insecurely; this is widely-acknowledged fact, for instance by cryptographer Phillip Rogaway (who was instrumental to developing the notion of misuse-resistance in cryptography), or @tarcieri (who is a Rust Secure Coding WG member, created the cryptography WG, and the Miscreant misuse-resistant cryptography library)...

It's the last I will answer, considering that I do not believe you are arguing in good faith:

you are answering “points” that nobody raised (for instance, re: insecure usage of a secure hash function);
you are avoiding to answer directed, specific criticism: for instance, my initial message had a structured, bulleted list of reasons why this isn't an appropriate solution; none of it was addressed;
you are making unsubstantiated claims and leaving the burden of disproving it on other contributors; this isn't appropriate in most contexts, but particularly unacceptable in security contexts: one does not simply assume that something is secure until proven otherwise;
you are silently moving goalposts, for instance sliding from “[PRFs providing negligible advantages] is simply untrue” to “Attempting to use a "secure" hash function with a bad collision mitigation mechanism in the HashMap is pure security theater”.

Note that those are characteristic features of sealioning, a common trolling and harrasment tactic, and that neither of those behaviours (harrasment and trolling) are compatible with the Rust Code of Conduct.

briansmith commented 5 months ago

The last comment is from 2019. Since that time, I think it's clear that the needs of libstd are different from what a typical getrandom user needs. Indeed, the implementation within libstd is now highly specialized, minimized, and performance-optimized for libstd's specific usages. Now the implementations look nothing like each other. I think we should close this.

rust-lang / rust

Use getrandom crate for retrieving system entropy? #62079