Gocryptfs Use of Scrypt Introduces a Vulnerability That Renders the Entire Cryptographic Schme Effectively Compromised

Librechain commented 3 years ago

Preface: To any that are reading, this is a continuation of a former (now closed) issue that I pulled a few days ago, inquiring about the inclusion of Argon2 / outright replacement of Scrypt. For some reason (probably general fatigue), the interaction (on my part) was a bit more informal in nature than I would have preferred. That is not the case here. Additionally, this 'issue' was not pulled for the sake of slamming Scrypt as a KDF - but rather to convey to the developer of this program, specifically, that their insistence on doing so is not just at the cost of foregoing a 'better solution', but rather protecting users...in addition to providing a better solution comparatively to the implemenation you're removing without any known exploits / compromises.

Scrypt is Exploitable in a Practical Way Via Cache Timing Attacks

Just to give some background (sure you already know, but for the sake of any follow-up readers):

"MHMix [function used by Scrypt for memory-hardness] takes the input block, hashes it many times while saving the hash results, and computes an output derived from some of the hash results that are chosen by interpreting certain hash values as indices. Since the hash values will be unique to each password input to scrypt, the indices of the hash values that are accessed and used in the output will also be unique to each password. Since the hash values that are needed to compute the true scrypt output are password-dependent, an adversary conducting a brute force dictionary attack against crypt will be unable to predict which values will contribute to the final output and will be forced to store all values to potentially be used, making array A in MHMix use large amounts of memory."

referencing a recent paper published by stanford titled, 'Attacking Scrypt via Cache Timing Side-Channel' (link = https://crypto.stanford.edu/cs359c/17sp/projects/MarkAnderson.pdf)

The paper goes on to outline the 'PRIME+PROBE' method, whcih the paper states is the "best technique to use against scrypt".

I won't waste too much time restating what is already contained in the paper, but in essence, an attacker is able to affect data in queued to be cached in a way that allows subsequent observations of said dat.

Specifically:

"First, the attacker flushes the victim's data from cache (the PRIME stage) by accessing data that it knows will evict thd victim's data from cache. Then, after waiting to give the victim an opportunity to acecss its data, the attacker attempts to access the data it put in cache (the PROBE stage). If the time to retrieve the data is relatively short, that means that the victim did not access its data. If the time to retreive the data is relatively long, the victim accessed the target data, which caused the attacker's data to be evicted from cache, forcing the attacker to wait for its data to be retrieved from a lower-level memory."

Insignificant Threshold For Would-Be Attackers

All that was written above possesses no significance without information on how viable such an attack would be.

Fortunately, the study outlines this for us as well.

It states:

The 'PRIME+PROBE' attack can be used at any time the attacker shares some level of processor cache with the victim.
Point one above is satisfied in an instance where the attacker is able to run another process on the same machine that the user is running scrypt on.
This can even occur if the attacker and victim are "working on separate virtual machines that are hosted on the same machine".
It is not necessary that the victim & attacker be on the same core of a multi-core processor either (as I know that must VM solutions involve dedicating some # of cores to the VM while running).
The above points are not exhaustive insofar as the attacker just needs to have figured out some means to put themselves in position to be sharing the machine's cache memory

Stanford Paper Outlines this Vulnerability in Scrypt That Allows For its Exploitation

Essentially what is stated in the passage above is that the PBKDF2 execution (within Scrypt is what is effectively compromised in this timing attack).

Both the Stanford study and the informational RFC (7914) are very clear in their documentation of the KDF's operation that a compromise of this portion of the Scrypt schme will allow an attacker to effectively compromise the entire algorithm:

Also, from the Stanford study:

"Learning enough information about the PBKDF2 hash of the victim's password allows an adversary to reduce an attack on scrypt to an attack on PBKDF2, thereby bypassing the memory-hardness of scrypt.

This then leads us to:

"More specifically, once the memory access pattern of scrypt is observed, the attacker can construct a dictionary of PBKDF2 hashes of potential passwords, compute what their access patterns will be, and compare the observed memory acecss patterns to the access patterns of the hashes in the dictionary."

At this point, we've now established that Scrypt can be exploited in a way that makes its inclusion in gocryptfs (beyond this point) questionable from many different perpsectives.

Addressing the Language Used in My Previous Issue

The apology that I made, remains. However, if this post is published, then I feel that those words will be justified, because one would have to stand in the face of well-established cryptanalysis, research and - to a fair extent, cryptographic principle itself in order to waive off the concerns that have been raised about Scrypt up to this point.

To be clear, that is not a threat, as this is now a matter of ethics at this point (and, no, that's not hyperbole).

Going Back to the Litecoin Miners That We Mentioned Earlier

If we're referring to an outputted hash from a successful Scrypt operation (that was not executed in a compromised environment), then sure, it is entirely implausible to suggest that even the entirety of the Litecoin network (hash rate-wise) would have the ability to brute force the password.

However, in lieu of what I've covered from the Stanford study regarding the trivial nature of the compromise of Scrypt's memory-hardness (which is supposed to be its key feature), we must now seriously consider the threat of this burgeoning network.

Why?

Per specifications, the output of the PBKDF2 function is piped into HMAC256 (via the two 32-bit output strings that would be created from the splitting of the 64-bit outputted by the PBKDF2 function).

This is critical to note because, when considering the information included above about timing attacks, it appears fair to assume that there is no 'random oracle' quality to the data extracted / observed by the attacker during the Scrypt hashing process

It Won't Be Litecoin - it Will Be BITCOIN Miners

Perhaps in hindsight my point about the Litecoin mining network was a bit irrational since they're mining Scrypt.

But by reducing the difficulty of 'cracking' the Scrypt to output to essentially brute forcing a PBKDF2(-SHA256) password (no salt) with default specified iterations, this seems like a job that would require trivial resource.

Available Commercial Hardware

Below is photo depicting Bitcoin mining hardware (top of the class, admittedly):

source: https://www.microbtwhatsminerd1.com/

This picture above shows us that this miner is capable of providing >400TH(s).

As reference, this represents >4.1 x 10¹³ hash operations PER second.

This Machine Only Costs $2,500

That's 'Joe Blow can buy this' prices.

I'm sure either you or I could go ahead and purchase this machine right now if we really wanted / needed.

So could many other folks in this world fortunate enough to be normally functioning, responsible enough adult to maintain a stable job in a 1st-world economy for longer than... 6 months, perhaps.

Scrypt Vulnerability is Equivalent to a Vulnerability Overall for Gocryptfs

Going back to Gocryptfs, its worth considering the role that 'Scrypt' plays in the larger picture of the cryptographic scheme that Gocryptfs provides for users.

Scrypt is used as the KDF for the password that must be input into the program in order to mount a gocryptfs container.

Thus, it is no stretch in logic to say that this timing attack on the cache renders the entirety of the gocryptfs scheme extremely vulnerable.

This is amplified by the fact that gocryptfs as a program executes itself as a bash script, meaning that any attacker that has access to the bash_history of a user's machine will be able to find out the:

Location of the 'gocryptfs' container (which defeats the general practice of initializing a '.'-prefixed [hidden] file)
Parameters for the scrypt KDF
Whether or not there is a password file that's used

And countless other command line specifications that the user may elect to use for their implementation

Concluding On Why Scrypt KDF Should Be Replaced

The solution is to replace it with Argon2 (which was literally designed to address all of the issues with Scrypt like the ones that I named above).

This write-up here does not outline the characteristics of Argon2 or what about its construction, specifically, makes it a better alterantive to Scrypt.

That will be appended in a subsequent entry to this 'issue' that I have pulled here in a few hours when I'm able to find a quick 30 minutes to break that down (for the edification of any that may stumble upon this issue at some point in the future out of curiosity for how the [hopeful] evolution past Scrypt for Gocryptfs occured)

bexelbie commented 3 years ago

@rfjakob has done an amazing job with Gocryptfs and it meets my needs effortlessly and I am proud to help package it for Fedora.

@Librechain you seem to have exposed a problem or your particular threat model. This doesn't, for example, impact my threat model. In situations such as this I would expect @rfjakob to accept patches, which he has offered to do. I would expect you, @Librechain , to do exactly what you promised to do, "I'll do all the heavy lifting if necessary. It won't take me (or you if you choose) any longer than 20-30 minutes, I imagine."

Please provide the patch you offered and not walls of text. Spreading FUD is not useful. If you have a threat model issue, state it as such. However, you offered a patch and now appear to be mad that everyone said, "please" instead of offering to do your work for you.

bexelbie commented 3 years ago

FWIW, I encourage this issue to be closed as we don't need it to process the patch.

Librechain commented 3 years ago

@bexelbie Not sure who you are or why you're looking to be an antagonist here, but I'm not interested in engaging.

Whether I provide the PR or @rfjakob does is aside from the point.

Everything above serves as the furthest engagement that you'll have with me. Please refrain from interjecting from this point forward so that we can commence with a clean thread of discussion on implementing a solution.

Thank you.

lechner commented 3 years ago

@rfjakob I also recommend closing this issue. Thank you for your hard work on gocryptfs!

LicoMonch commented 3 years ago

Just my 201 cents here as a simple 'end-user' (and no, I don't need any answers to this :-) ):

The issue is overloaded with useless informations -> just stating 'feature request for replacing scrypt with a safer algorithm' would be sufficient and then maybe also giving a remote reference for its weakness
Your explanations are far away from beeing understandable - more like: confusing, distracting, attention grabbing
The referenced 'study' could not offer any kind of real world attack nor POC (as stated in its conclusion), which means the statement 'Scrypt is Exploitable in a Practical Way Via Cache Timing Attacks' is a lie and the whole issue exposed as .. yeah, to what exactly? Trolling? Dunno .. at least: useless, imho
I'm no crypto pro, but while searching for informations about the mentioned threat, I did not find anything, that would raise any concerns on my site regarding the safety of my encrypted data
Even if the threat would be real and an attacker for example could get a VM on the same VM-hoster, where I encrypt my personal data with gocryptfs. Then he gets (he should have played in the lottery, because this whole example is a jackpot) the password to my encrypted data with a side-channel attack supported by an ASIC - which the VM-hoster kindly connected to his VM - just in that millsecond after I entered it - wow, lucky bastard - and then? How does he now decrypt my data? For myself, I could imagine a few more 'real world" scenarios to get the data I want, especially if I can do things like in the example.
No.5 is maybe non-sense as I probably do not understand the whole problem - BUT: this is an open issue tracker, even crawled by search engines and so you shout out to the whole world 'gocryptfs is compromised' and that's - as far as I could dig into it - not true and more a lie

(PS: I don't mind if this post gets deleted, but holly shit, it could not be unseen )

rfjakob commented 3 years ago

Looking at this paper the author admits that he could not make the attack work in practice:

4 Conclusion 4.1 Future Work At the time of writing this paper, the author’s efforts to implement the attack described by modifying the opera- tion of existing tools have not been successful.

So, the paper describes a theoretical attack on scrypt, that only can work (if at all) during the time window where gocryptfs checks the password, and the attacker needs control over a CPU on your system.

I don't see any urgency here. Nothing is compromised, and I'm closing the ticked.

Adding Argon2 sounds like a good idea, still, maybe I will add it one day.

impredicative commented 3 years ago

@rfjakob Should there be an open issue for the desired higher-security option (perhaps Argon2 #520 )? As a user I will feel better knowing that the highest security always remains on your radar. Perhaps all such open security issues should also be labeled with a common label. Thanks.

Security requires paranoia!

rfjakob commented 3 years ago

Yes, sure

rfjakob / gocryptfs