Open newpavlov opened 5 years ago
Are there existing instances of std depending on another crate for cross-OS functionality?
In particular what I'm wondering is: When a new platform is added, do we have a strategy for how to update both std
and the getrandom
crate that doesn't break CI testing for some period of time?
For unsupported platforms getrandom
crate uses a dummy implementation which always returns Error::UNAVAILABLE
. And the linked PR uses zeros in case of getrandom
failure, thus new targets will be "supported" automatically.
I think this should only be done if getrandom
is moved into the rust-lang org. But, this proposal is kind of the opposite of that: taking rust-lang-managed code and replacing it with externally-managed code.
What concrete problem is this actually solving?
And Rust already depends on rand (via tempfile) which in v0.7 will use getrandom
std
's dependency on the external tempfile crate seems to be a core problem itself.
Note that even earlier this month there were serious bugs found in the getrandom
crate's implementation.
I'd personally like to see some sort of first class API in Rust which solves this particular problem. That said I think getrandom
contains a lot of good work which is at the very least a stepping stone towards getting us there.
I'm not sure the proper interface boundary to make this work "first class": a lang-item?
To answer @briansmith's concerns: should rust-random
be made more official? Is rust-lang-nursery
more official? Is anything "official" besides the code that resides in https://github.com/rust-lang/rust ?
As I look at recent rust-internals
threads and hotbutton items that perpetually come up, I definitely wonder: why isn't secure randomness in core
?
I think it probably should be.
The overwhelming majority of CPUs and microcontrollers are dedicating silicon to this problem. Based on that, and the amount of time I spend pontificating on this particular problem with both hardware designers and cryptographers, I think the problem is a core
concern.
@briansmith
We have no objections against transferring getrandom
to rust-lang
org and of course will be happy to receive extensive reviews from community.
What concrete problem is this actually solving?
Having a single code base for securely retrieving system entropy, not 3 or more. Considering non-triviality of this problem on some targets, I believe it's a problem worth soving.
One of the reasons cited for why this is needed is because HashMap depends on random.
In the course of implementing aHash I found a much faster way to do this.
For it's randomization aHash uses a combination on Const-Random and the address of the HashBuilder. This essentially eliminates any runtime overhead and makes creating small hashmaps on the stack much cheaper.
I am hoping aHash becomes the default hasher, but even if it doesn't this approach is worth considering.
So you combine a high-quality per-binary key, a low-quality (limited address space) per-run key with a hash function optimised for small input who's security relies on no-one seeing the output? Certainly intriguing.
I guess the important question is whether this is good enough for everyone's hashing needs (because if not, it doesn't remove the need for a system RNG in std
). This approach does have more potential attack vectors (e.g. leaking hashmap keys in an application might be enough to launch a DOS attack on that application, possibly even on a new run from the same binary).
@tkaitchuck That approach seems thoroughly unsuitable as a default, as its security relies on both:
const-random
), which:
std::hash::{Hash, Hasher}
in general, i.e. would silently break current users of those APIs.The current, default Hasher
is AFAIK SipHash-1-3, which is designed to be a short-input PRF; in particular, that removes all issues related to (2): the knowledge of some hash values doesn't provide a meaningful advantage for guessing (properties of) other hash values.
Moreover, cryptography is littered with broken systems that were designed under assumptions that weren't precisely the ones that are guaranteed by the actual primitives used; having the default hash implement a random oracle (i.e. a PRF) is the strongest requirement which we can make of it, it matches the intuition of what a hash function is, and many other desirable properties (MAC, ...) follow from it. Switching to a function that only upholds weaker properties means that we are moving cognitive and cryptographic burden onto the users of this API, which is very undesirable as it's a very-common API meant for use by virtually-all Rust developpers (as opposed to a specialist API that assumes in-depth knowledge of cryptography).
On the topic of aHash itself, it doesn't seem to make any well-defined security claims, let alone claims w.r.t. the standard models used in cryptography (being a PRF, a XOF, a MAC, ...); as such, it's unclear to me how anyone is supposed to use it appropriately (i.e. only relying on properties it claims to provide).
For it's randomization aHash uses a combination on Const-Random and the address of the HashBuilder.
As @KellerFuchs noted this breaks reproducible builds, which have been a major goal of the Rust ecosystem and an area of concern for the Rust Secure Code WG. I personally consider anything which breaks reproducible builds to be highly undesirable.
I am hoping aHash becomes the default hasher
I strongly hope anything selected as the default hasher has undergone a considerable amount of peer review, particularly by those involved in previous hashDoS attacks (e.g. Jean-Phillippe Aumasson, Martin Boßlet, djb). I do not think aHash meets that bar.
What aHash does seems not too far off from Golang's aeshash
, which was made the default hasher on x86. aeshash
had numerous problems including but not limited to key recovery attacks, and I think serves as a great example of why hash functions like this need considerable scrutiny before being made the default.
@KellerFuchs
The current, default Hasher is AFAIK SipHash-1-3, which is designed to be a short-input PRF; in particular, that removes all issues related to (2): the knowledge of some hash values doesn't provide a meaningful advantage for guessing (properties of) other hash values.
That simply is not true. If an attacker can induce values to be hashed and see the resulting hashes. They can simply brute force partial collisions. No hashing algorithm can protect against this.
The current, default Hasher is AFAIK SipHash-1-3, which is designed to be a short-input PRF; in particular, that removes all issues related to (2): the knowledge of some hash values doesn't provide a meaningful advantage for guessing (properties of) other hash values.
That simply is not true. [...] They can simply brute force partial collisions.
@tkaitchuck Yes, that's what “doesn't provide a meaningful advantage” means: the adversary cannot distinguish it (up to some negligible advantage, here it's IIRC 2⁻⁶⁴) from a random oracle, and so cannot do better than a generic brute-force attack (i.e. they don't gain a meaningful advantage from us using the concrete SipHash function, compared to the idealised model).
That's the essence of what security claims in cryptography are, and you might want to familiarise yourself with that field before pushing for the inclusion of your ego-project in the stdlib?
@kellerFuchs I am quite familiar with security claims. There is more going on here. It is useless to prove an adversary can not distinguish it from random, if you reveal to them the output. Even the most secure hash function with a perfectly random seed will leave you open to DoS.
Consider a (bad) hashTable that decides to rehash after N consecutive collisions. If a attacker can just observe the iteration order of the map they can very rapidly exhaust the systems memory. Because if the lower bits of group of keys is the same they will all go into the same hash bucket. So brute forcing N bits can force the allocation of 2^n ram. If instead collisions are linked, a cpu use attack becomes trivial.
Instead a HashMap must inline the collisions into the array, and should keep additional bits (as HashBrown does) and change the hash function upon resize.
Attempting to use a "secure" hash function with a bad collision mitigation mechanism in the HashMap is pure security theater.
Once all of those requirements on the HashMap are in place, the requirements on the hash function are actually weak but very specific: It must meet the strict avalanche criterion and bit Independence criterion for all the bits of the input and of the key.
That's it. It doesn't have to be a strong PRF, be non-reversable, or be indistinguishable from random output. Asserting that these properties hold is in fact counter productive because it leads people to the incorrect belief that these are relevant or sufficient, when they are not. Which in turn leads to the very common mistake of installing a 'strong' hash function and assuming the problem is solved, leaving the system wide open to attack.
@tkaitchuck this will be my last response on this issue as I feel like your hash function is a distraction from the real issue at hand, and for that reason I'll keep it brief.
The core idea of hashDoS is applying traditional cryptanalysis techniques to non-cryptographic hash functions, and then leveraging those attacks to generate multicollisions. If we look at how the attacks on MurmurHash2 and CityHash (i.e. hashDoS Reloaded as presented at 29c3) actually worked, they were both based on differential cryptanalysis (see Martin Bosslet's excellent writeup on these attacks).
Reduced round AES is known to be vulnerable to differential cryptanalysis, including but not limited to practical impossible differential cryptanalysis up to 5 rounds, as well as practical key recovery attacks up to 5 rounds.
Constructions which purport to address hashDoS require rigorous security requirements and analysis, ideally published in the form of a peer reviewed research paper. Your construction does not meet these requirements, and does quite the opposite in terms of using AES in ways which, for me at least, sets off alarm bells.
Yes. This has gotten off topic. Please post any comments in terms of what you would like to see here: https://github.com/tkaitchuck/aHash/issues/10
Even the most secure hash function with a perfectly random seed will leave you open to DoS. [...] Instead a HashMap must inline the collisions into the array, and should keep additional bits (as HashBrown does) and change the hash function upon resize. [...] Attempting to use a "secure" hash function with a bad collision mitigation mechanism in the HashMap is pure security theater.
Yes, when rehashing a table, one should select a new random key (hence the term rehashing...). In general, it is off course entirely possibly to use a perfectly-fine hash function insecurely (as your example shows).
Nobody in this discussion claimed the opposite, or that std::hash::Map
should use a bad collision mitigation mechanism, so I'm not sure who or what you believe to be answering.
Once all of those requirements on the HashMap are in place, the requirements on the hash function are actually weak but very specific: It must meet the strict avalanche criterion and bit Independence criterion for all the bits of the input and of the key
This is incorrect for several reasons:
HashMap
isn't the only user of std::hash
(a public API).It doesn't have to be a strong PRF, be non-reversable, or be indistinguishable from random output. Asserting that these properties hold is in fact counter productive because it leads people to the incorrect belief that these are relevant or sufficient, when they are not. Which in turn leads to the very common mistake of installing a 'strong' hash function and assuming the problem is solved.
I'm not sure why you are asserting that weakening the properties provided by a public API is good ergonomics and will lead to less insecure code being written; I'm not usually in the habit of repeating myself, but here you go:
cryptography is littered with broken systems that were designed under assumptions that weren't precisely the ones that are guaranteed by the actual primitives used; having the default hash implement a random oracle (i.e. a PRF) is the strongest requirement which we can make of it, it matches the intuition of what a hash function is, and many other desirable properties (MAC, ...) follow from it. Switching to a function that only upholds weaker properties means that we are moving cognitive and cryptographic burden onto the users of this API
In any case, this isn't something I will debate: providing misuse-resistant constructions are the best tool we have to help people avoid using them insecurely; this is widely-acknowledged fact, for instance by cryptographer Phillip Rogaway (who was instrumental to developing the notion of misuse-resistance in cryptography), or @tarcieri (who is a Rust Secure Coding WG member, created the cryptography WG, and the Miscreant misuse-resistant cryptography library)...
It's the last I will answer, considering that I do not believe you are arguing in good faith:
Note that those are characteristic features of sealioning, a common trolling and harrasment tactic, and that neither of those behaviours (harrasment and trolling) are compatible with the Rust Code of Conduct.
The last comment is from 2019. Since that time, I think it's clear that the needs of libstd are different from what a typical getrandom
user needs. Indeed, the implementation within libstd is now highly specialized, minimized, and performance-optimized for libstd's specific usages. Now the implementations look nothing like each other. I think we should close this.
Right now
std
andgetrandom
essentially duplicate each other. And Rust already depends onrand
(viatempfile
) which in v0.7 will usegetrandom
. So I think it makes sense to use a single implementation, to keep work on correctly retrieving system entropy focused in one place.Note that right now I do not propose to expose
getrandom
API as part ofstd
or introduce a lang item, it's a separate discussion, see rust-random/getrandom#21. Also it's probably will be better to wait until rust-random/getrandom#13 will be closed, which should happen relatively soon.cc @dhardy @tarcieri