Open ChinoCribioli opened 2 months ago
Thanks @ChinoCribioli and @artwyman!
Perhaps @cedoor is willing to do so, or can suggest someone to double-check here.
I'm also not competent enough to check the logic out here. I'll ask other people to review it.
@ChinoCribioli Is this a breaking change?
@chancehudson I just pushed most of the changes you requested. My only doubt now regarding any possible remaining timing vulnerability lies on the lines e %= order
and if (e & mask)
. Mostly the latter, since I don't know if the evaluation of an if statement in an integer is implemented in constant time.
Comments on the latest changes. I'm still curious about relative performance of this implementation vs. the previous one.
I've run the tests of the baby-jubjub
and eddsa-poseidon
packages (which are the main packages affected by this PR) with both implementations and these are the results:
on the lines
e %= order
andif (e & mask)
. Mostly the latter, since I don't know if the evaluation of an if statement in an integer is implemented in constant time.
The if statement should be fine as long as e
is always the same size. This branch is specifically designed so that each codepath is equal time (this is the Montgomery ladder strategy). We just need to make sure the condition evaluation is constant time as well, which i suggested a change to try and ensure.
Again though, this is best effort. Regardless we're going to have some timing leakage from the BigInt
implementation.
I've run the tests of the baby-jubjub and eddsa-poseidon packages (which are the main packages affected by this PR) with both implementations and these are the results:
I think this needs a more direct test of the affected operations (repeated in a loop to avoid fixed overhead and caching effects) to make the differences more clear. It does look to me like the test times are uniformly slower, though, which isn't very pleasing. I submitted an issue requesting to make it faster (which looks like it may be met by a WASM implementation implemented in Rust). The utest which targets small values in particular is expected to get slower for a constant-time algorithm, but I assume the eddsa-poseidon tests are operating on more typical values.
I'm not entirely sure of the right trade-off here. Speaking for the Zupass use case, I think we probably care more about performance than timing attacks right now, but that's not a universal choice I expect to apply to everyone.
I've run the tests of the baby-jubjub and eddsa-poseidon packages (which are the main packages affected by this PR) with both implementations and these are the results:
I think this needs a more direct test of the affected operations (repeated in a loop to avoid fixed overhead and caching effects) to make the differences more clear. It does look to me like the test times are uniformly slower, though, which isn't very pleasing. I submitted an issue requesting to make it faster (which looks like it may be met by a WASM implementation implemented in Rust). The utest which targets small values in particular is expected to get slower for a constant-time algorithm, but I assume the eddsa-poseidon tests are operating on more typical values.
I'm not entirely sure of the right trade-off here. Speaking for the Zupass use case, I think we probably care more about performance than timing attacks right now, but that's not a universal choice I expect to apply to everyone.
I get that for some projects performance is more important than
I've run the tests of the baby-jubjub and eddsa-poseidon packages (which are the main packages affected by this PR) with both implementations and these are the results:
I think this needs a more direct test of the affected operations (repeated in a loop to avoid fixed overhead and caching effects) to make the differences more clear. It does look to me like the test times are uniformly slower, though, which isn't very pleasing. I submitted an issue requesting to make it faster (which looks like it may be met by a WASM implementation implemented in Rust). The utest which targets small values in particular is expected to get slower for a constant-time algorithm, but I assume the eddsa-poseidon tests are operating on more typical values.
I'm not entirely sure of the right trade-off here. Speaking for the Zupass use case, I think we probably care more about performance than timing attacks right now, but that's not a universal choice I expect to apply to everyone.
I get that for some use cases performance is more important than safety, I honestly don't know which are the main projects that use this library. However, I think that any implementation of a rather complex cryptographic protocol must have a strong focus on security and, as you said, the performant variant may not be the best option for a lot of uses this library might have.
However, I think that any implementation of a rather complex cryptographic protocol must have a strong focus on security and, as you said, the performant variant may not be the best option for a lot of uses this library might have.
I think security should have priority over performance, but I would wait for a full review by the audit team.
One solution might be to release a new major if the security benefits are substantial and continue to maintain version 1 anyway.
One solution might be to release a new major if the security benefits are substantial and continue to maintain version 1 anyway.
@chancehudson @artwyman any thoughts?
One solution might be to release a new major if the security benefits are substantial and continue to maintain version 1 anyway. @chancehudson @artwyman any thoughts?
I have thoughts in two directions. First general thoughts on maintaining 2 versions:
It's workable, but definitely has a cost in maintenance burden and confusion. Either both versions need to be kept up to date with future patches, or else v1 will slowly become less useful, and potentially even insecure. I historically have a preference for "develop on main" workflows since the code everyone looks at and uses the code which will remain up-to-date and workable. I realize this is a preference, though, not a clear-cut definition of "best".
If you're going to support multiple algorithms for a significant time period, it may be better to keep them both in the code separately. They could be separate packages, separate functions within the package, or configurable via an "options" parameter. That keeps them both "on main" so they should benefit from ongoing work, unit testing, etc.
2nd, thoughts on the specifics of the security vs. performance tradeoff here:
Given bigint arithmetic isn't constant-time anyway, this algorithm isn't a complete solution to the security problem. That makes me personally de-prioritize it vs. performance, but that's a call reasonable people can make differently. I would defer to someone with more security expertise, and who has specific threat models in mind to make the call on whether a constant-time algorithm built on variable-time bigint math reduces timing attack risk by enough of a margin to be worth prioritizing.
Calling the current implementation the "performant variant" is admittedly a bit of a stretch, since any pure-JS implementation of cryptography isn't going to be very performant. A preferable alternative would be to solve this problem fully as part of a more optimized implementation (probably wasm) which is fast enough to be an improvement to performance even with a constant-time algorithm. Such an implementation could improve both performance and security at once . I'm not sure how soon that's likely to happen, though, so there may need to be trade-offs or provide multiple options in the near term.
Description
The
mulPointScalar
method is implemented with the regular square and multiply algorithm, which is prone to timing attacks due to the fact that the number of EC point additions depends on the number of 1's in the binary expression of the scalar.With this implementation (called Montgomery Ladder) this is avoided by maintaining the number of additions depending only on the length of the binary representation of the scalar.
As documented in my implementation of the method, the algorithm works because of the following invariant: At each step,
R0
will ber_0*base
wherer_0
is the prefix ofe
written in binary andR1
will be(r_0+1)*base
. In other words: at iterationi
of the loop,r_0
's binary representation will be the firsti+1
most significant bits ofe
. If the upcoming bit is a 0, we just have to doubleR0
and addR0
toR1
to maintain the invariant. If it is a 1, we have to doubleR0
and add1*base
(or addR1
, which is the same as(r_0+1)*base
), and doubleR1
to maintain the invariant.Related Issue(s)
Fixes #324
Other information
Aside from the new implementation of the
mulPointScalar
method, I added some tests to automatically check basic behaviors such as EC point additions, EC point multiplications, and the orders of the generating/base points used in the protocol.These tests are independent of the new implementation of the multiply method and are intended to make the test base more robust.
Checklist
yarn style
without getting any errors