Closed mflaxman closed 10 years ago
Thanks for the PR and the kind words!
On the surface this looks like a reasonable addition. I'm not a crypto expert, so I'm going to spend some time reading about mitigating compromised PRNGs as well as pitfalls and common mistakes devs run in to when trying to add more entropy. I will post a follow up after I feel like I've gained a good understanding of the correct approaches. I'm hoping to learn, at a minimum:
If you have any good resources to read about this, please post them!
In the meantime, can you add some tests for this behavior? I'm particularly concerned about possibly overflowing the random seed beyond 512 bits (the user may enter a very large number, and we're just adding the user entropy to the seed) and the effect that might have on the from_master_secret
derivation, as well as verifying that two different user_entropy values result in differing results when starting with a seeded PRNG.
Can you also add some usage instructions to the README? The snippets you posted in this PR are suitable.
Unfortunately, I don't have great reading resources! I was hoping you did :(
As far as the addition part, the value is converted to hex and then hashed with sha3 after, so you should be able to add nearly infinite value to it without a problem, right? In other words, if the random seed was previously the hex representation of some big number (say 2**250) and then you added a bunch more text (say the first page of the US constitution), that would only add entropy and have no downside because it would take slightly longer to calculate the SHA3 of this new value). It's not a good example because now everyone knows you used the first page of the constitution, but that's a separate topic and you haven't made the underlying private keys any less secure by doing so. It's only additive.
On that note, I don't understand why use long_to_hex
to convert random_seed
to hex? I don't see any vulnerability here, but it adds complexity and I don't see what the gain is. Perhaps I'm misunderstanding though? You could just pass random_seed
(plus any user_entropy
) to from_master_secret
and it'd be easier to follow. On second thought, I'm pretty sure this the best approach. As a bonus, it also means user_entropy
doesn't have to be an int/long, which is much better for anyone using the library.
I wanted to respond to one thing you said that I disagree with: "I'm particularly concerned about... two different user_entropy values result in differing results when starting with a seeded PRNG"
To be clear, calling new_random_wallet()
repeatedly with the same user_entropy
should result in a different output each time. The user_entropy
adds extra randomness, but is not the only source of randomness. To see a deterministic result, your from_master_secret
function is perfect. Does that make sense? I get the feeling it isn't and that this could cause people great ruin if they mistakenly think that saving user_entropy
is enough to recover a private key that was generated with new_random_wallet()
. Hmmm...
As far as the addition part, the value is converted to hex and then hashed with sha3 after, so you should be able to add nearly infinite value to it without a problem, right? In other words, if the random seed was previously the hex representation of some big number (say 2**250) and then you added a bunch more text (say the first page of the US constitution), that would only add entropy and have no downside because it would take slightly longer to calculate the SHA3 of this new value).
Yes, I believe that is correct. If I understand correctly, even adding a constant to every random value from the PRNG would increase the entropy (though perhaps not enough to defeat a compromised generator), as long as that constant stayed secret. (Also, SHA-512 is used, but that's just a nit).
On that note, I don't understand why use long_to_hex to convert random_seed to hex?
Only because the from_master_secret
method expects its input to be bytes, and Crypto.Random.random.getrandbits
returns a long.
I don't see any vulnerability here, but it adds complexity and I don't see what the gain is. Perhaps I'm misunderstanding though? You could just pass random_seed (plus any user_entropy) to from_master_secret and it'd be easier to follow. On second thought, I'm pretty sure this the best approach. As a bonus, it also means user_entropy doesn't have to be an int/long, which is much better for anyone using the library.
I do like the change from long_to_hex(random.get_rand_bits(512))
to get_random_bytes(512/8)
. I must have overlooked that method the first time around.
I wanted to respond to one thing you said that I disagree with: "I'm particularly concerned about... two different user_entropy values result in differing results when starting with a seeded PRNG"
To be clear, calling new_random_wallet() repeatedly with the same user_entropy should result in a different output each time. The user_entropy adds extra randomness, but is not the only source of randomness. To see a deterministic result, your from_master_secret function is perfect.
Sorry, my wording could have been better. I'm very much a crypto beginner, so I err on the side of extreme caution. What I meant to convey was that, while I do generally trust the crypto primitives and that things are well behaved, I just want tests to verify that the real behavior matches the expected behavior. I didn't mean to imply that I expected the same results with the same user entropy, that would certainly be broken!
I get the feeling [...] that this could cause people great ruin if they mistakenly think that saving user_entropy is enough to recover a private key that was generated with new_random_wallet(). Hmmm...
Agreed, and I like the changes you made to the docstring and the readme to explicitly show that just knowing the user entropy is not enough to recover your keys.
As for adding the user entropy to the random seed, I'm still a little uncomfortable with a simple addition. I have no good reason to be uncomfortable, though. It's just a niggling feeling that addition might not be a suitable operation. I spoke with one of my friends who's a bit more familiar with crypto and he recommended either xor-ing the user entropy with the random seed, or using a hash function which uses a different source of entropy or different hashing family altogether, and then xoring those results before passing to from_master_secret
.
xor-ing the user entropy with the random seed
XORing would be fine as well, but I think it's adding complexity for no security benefit. Here's how I would probably implement it: http://stackoverflow.com/questions/2612720/how-to-do-bitwise-exclusive-or-of-two-strings-in-python
I think this could spook anyone who is auditing the code, it's not obvious what's going on.
In practice, I see no security difference between XOR and simple addition of two strings, what's the concern that XOR would solve/mitigate? Anyone who had two seed
values could trivially reverse engineer user_entropy
in either case.
using a hash function which uses a different source of entropy
What does this mean? Hash functions are by definition deterministic, meaning they have no entropy. The only new entropy here is what we pass in with user_entropy
.
using a hash function which uses a... different hashing family altogether, and then xoring those results before passing to
from_master_secret
I suppose we could hash the user_entropy
a few times with other functions first and then add (or XOR) that to the random_seed
. I still don't see what problem that's solving though? We explicitly allow this to be anything (because it is additive), so I don't understand what benefit we get from hashing it.
For reference, here is Vitalik's non-BIP32 code for generating a new private key:
https://github.com/vbuterin/pybitcointools/blob/4b125b5785bdc984bf367d1f8843a6ce8ea2bda2/bitcoin/main.py#L316
(note that I disagree with his usage of random.randrange
here, but that's another topic. It just complicates things without adding any security benefit, I don't think it does any harm)
Another point of reference is that in pycoin they use a bytearray
as an entropy source, which I believe is just a fancy way to concatenate:
https://github.com/richardkiss/pycoin/blob/e949a618223dff300a3cec1115a2463aa34d07ff/pycoin/scripts/genwallet.py#L63
I've never seen this data structure before, so I'm not 100% sure.
using a hash function which uses a different source of entropy
What does this mean? Hash functions are by definition deterministic, meaning they have no entropy. The only new entropy here is what we pass in with user_entropy.
Haha, oops. I was going to say two different things and ended up making one nonsense statement. What I meant to describe were these two options: 1) generate a random value with pycrypto and another random value from another source (possibly seeded by the user entropy input) and combine the result in some way, or 2) sha-512 the user entropy input and the combine that with the random 512 bit value before calling from_master_password
.
The first option sounds very straightforward, I think the only question is whether to concatenate or XOR. I think concatenation makes more sense for the reasons above, but I'd love to hear a counter-argument.
XORing would be fine as well, but I think it's adding complexity for no security benefit...I think this could spook anyone who is auditing the code, it's not obvious what's going on.
Fair point. Simple readable code is certainly preferred.
For reference, here is Vitalik's non-BIP32 code for generating a new private key... Another point of reference is that in pycoin...
Thanks for the links. It certainly alleviates my concerns around addition to see it being used by those two libraries.
I think concatenation makes more sense for the reasons above, but I'd love to hear a counter-argument.
I haven't been able to find any reasons to not use concatenation, and that certainly is simpler.
What do you think of this?
seed = get_rand_bytes(512/8)
seed += os.urandom(512/8) # Even without user-provided entropy we get a bit more
if user_entropy:
seed += sha512(user_entropy).digest()
return self.from_master_password(seed)
Why seed from both get_random_bytes
and os.urandom
? It appears that get_random_bytes
uses os.urandom
under the hood:
https://github.com/dlitz/pycrypto/tree/master/lib/Crypto/Random
(kind of complicated to track, see https://github.com/dlitz/pycrypto/blob/2d1aecd731f91f7367ef2a4069d305e2a1b488c0/lib/Crypto/Random/OSRNG/posix.py#L39 and https://github.com/dlitz/pycrypto/blob/2d1aecd731f91f7367ef2a4069d305e2a1b488c0/lib/Crypto/Random/OSRNG/fallback.py#L37)
Also note that 512/8 will return a float (64.0) on python 3 and break the code. It's easier to just hardcode 64 instead.
I also like what Vitalik did with the time. It's a weak protection to be sure, but it's simple and straightforward.
I disagree with hashing user_entropy
before concatenating (I don't see a benefit), but I see no downside except that it's slightly harder to read and slightly slower. Both seem minor.
I just updated my PR now with that.
Oops, I just deleted my comment. Doh! I don't see a way to recover it :(
Oops, I just deleted my comment. Doh! I don't see a way to recover it :(
It was emailed to me, so here it is for posterity:
This is what I meant to do:
In [1]: import time
In [2]: example = time.time()
In [3]: example
Out[3]: 1399162963.192513
In [4]: example*10**6
Out[4]: 1399162963192513.0
In [5]: int(example*10**6)
Out[5]: 1399162963192513
The point is to capture all the "entropy" that time.time() provides before rounding so that (assuming a compromised RNG) one can't just brute force a range of time during which the private keys might've been created. My first thought was that Vitalik's implementation was off and so I calculated the appropriate #.
I just copy-pasted his code and changed the #, but now I realized his code is broken here. Raising to the power of 7 provides no brute-force protection, as that's a trivial calculation on any machine. He should do that before taking the int, correct? Haha, maybe I should file a pull request on his library.
Anyway, I'll update on my local and push to bitmerchant now.
Thanks for recovering it. I was trying to edit it to clarify that I don't think Vitalik's code is broken in an insecure way, just that he was probably raising to the power of 7 to get (most of) the "entropy" from time.time()
and that by doing that outside of the int
parenthesis the raising to the power of 7 is broken. There's still value in the rest of the entropy, but the code I pushed to bitmerchant is strictly superior.
Alright, this looks good, but before I merge this can you add a test that mocks out time and urandom and verifies that user_entropy is being used to alter the result of from_master_password
?
I just added a basic unit test and pushed it, but I don't know how to mock out function calls in this framework. Do you have an example you could point me to?
Thanks for the contribution, @mflaxman
Hey @sbuss, awesome library you've built! Super clean and easy to follow.
What do you think of this change?
Example usage:
Of course, it works as it always has without
user_entropy
:I restricted
user_entropy
to ints/longs for simplicity, but perhaps it would be better to allow any type of input to make banging on the keyboard easier? Another benefit is that a larger character space means less banging needed to get the same amount of entropy.