novitski / bitcoinj

Automatically exported from code.google.com/p/bitcoinj
Apache License 2.0
0 stars 0 forks source link

Use custom secp256k1 implementation in (beta) Bouncy Castle 1.51 #509

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
We (Bouncy Castle) have just published a beta release 
(http://downloads.bouncycastle.org/betas/) of the upcoming 1.51 (projected 
release date around March/April). Significantly for bitcoinj we have 
implemented a customized version of the secp256k1 curve that runs a lot faster 
than the generic curve implementation (4-5 times 1.50 speeds, already much 
improved on 1.49).

I'm hoping to see some bitcoinj integration testing of this and future beta 
releases well in advance of 1.51 final. I'm looking for feedback on the 
performance improvements, and of course to flush out any bugs or 
incompatibilities introduced.

To switch to using the custom curve, all that is required is to replace 
SECNamedCurves.getByName("secp256k1") with 
(org.bouncycastle.crypto.ec.)CustomNamedCurves.getByName("secp256k1"). Of 
course the bouncycastle/spongycastle distinction has to be dealt with one way 
or the other too.

Original issue reported on code.google.com by peter.de...@gmail.com on 5 Jan 2014 at 1:11

GoogleCodeExporter commented 9 years ago
Excellent, thanks.

I will prepare a branch that uses regular Bouncy Castle 1.51 when I next get a 
bit of spare time for this stuff. If necessary we can always do our own 
SpongyCastle equivalent for the Android app, though on Android we don't do a 
whole lot of EC math so it's not very sensitive (yet - that might change with 
HD wallets).

Original comment by hearn@google.com on 5 Jan 2014 at 2:25

GoogleCodeExporter commented 9 years ago
Someone tried this and reported a 20% performance improvement. Is that the sort 
of figure you're expecting to hear?

Original comment by hearn@google.com on 30 Jan 2014 at 12:58

GoogleCodeExporter commented 9 years ago
I would need to know more about the particular benchmark, of course. As I 
mentioned above, the custom curves show a raw improvement vs 1.50 (when 
measured over thousands of point multiplications) of at least %300 (i.e. 4 
times the mults-per-second). Single-threaded on my i7-2640M @2.80GHz (Win7 
x64), (custom) secp256k1 can do about 3300 random-point multiplications per 
second, but this is in an intensive loop and I am not sure what to expect 
embedded in an actual application.

If the %20 improvement is to on-the-whole application performance, that would 
in my view be consistent with the current version of bitcoinj spending about 
%25 of its time on EC calcs vs only ~%6 with the custom curve, but this is only 
speculation unless we can agree on a benchmark that I can subject to profiling. 
As I have no experience using bitcoinj I would greatly appreciate if anyone 
could put forward such a benchmark.

Original comment by peter.de...@gmail.com on 31 Jan 2014 at 1:41

GoogleCodeExporter commented 9 years ago
The 20% came from generating new addresses, so random number then scalar mult. 
But I didn't profile it myself. This came from someone else. I haven't had a 
chance to try myself with sig checking yet.

Original comment by hearn@google.com on 31 Jan 2014 at 9:41

GoogleCodeExporter commented 9 years ago
The latest numbers for the custom secp256k1 have improved to around 4000 random 
point-mults/second (same machine specs as above). Sqrt has been optimized, 
greatly improving point-decompression speed (x~4). Additionally we now have a 
fixed-point optimization which improves signing speed (x~2) in ECDSA, and some 
improvements in verification speed vs 1.50 (x~1.5).

This is all available in the current beta at 
http://downloads.bouncycastle.org/betas/ , which should be a drop-in 
replacement for 1.50 once 
https://code.google.com/p/bitcoinj/issues/detail?id=497 is resolved.

Original comment by peter.de...@gmail.com on 24 Feb 2014 at 3:11

GoogleCodeExporter commented 9 years ago
I have made a branch that uses BC 1.51 beta 10. I just imported the code and 
sedded the import lines to be back to org.bouncycastle.*

This is in the bc151 branch.

I also rebased the hdw-alpha (hd wallets) branch onto bc151, creating a new 
branch for that, hdw-alpha-bc151.  What I see is about a 5x speedup when 
creating new keys, vs 1.50 - nice work! I haven't tried signature checking yet, 
but it sounds like this is in the realm of what is expected.

If you want a goal to stretch for, libsecp256k1 is capable of about 20,000 
mults/sec on a 3.2Ghz core.

Original comment by mh.in.en...@gmail.com on 8 Apr 2014 at 4:55

GoogleCodeExporter commented 9 years ago
Good to see those sorts of numbers, Mike.

The latest performance for random mults with custom secp256k1 is now >6000/s on 
the machine described above (thanks to implementing GLV technique). I am not 
optimistic about closing the gap towards 20k/sec much further. We are presently 
stuck with representing field values as int[], i.e. 8 32-bit words. This is (in 
my current understanding) a limitation of Java, since we have no explicit 
access to 64*64->128 multiplication, nor to vectorization, and I'm not sure how 
much magic HotSpot can work on the generated code within these constraints. A 
lot could be done with a few well-chosen primitives exposed by the JDK, so I 
may have to take it up with them (though others have tried without success).

Probably more relevant to the key generation and signing cases is the 
fixed-point optimization. We'll have a new beta tomorrow-ish which makes it 
easy to override the default window width (5 for secp56k1) used in the 
fixed-point multiplier (trading off some extra memory usage). Since you use 
just the one curve, I think it's a no-brainer to increase the width used in 
bitcoinj for the base-point. It will be a simple one-liner following the 
CustomNamedCurves.getByName call; I'll post a patch when the beta is up.

Original comment by peter.de...@gmail.com on 10 Apr 2014 at 5:37

GoogleCodeExporter commented 9 years ago
BC 151b11 is now up at http://downloads.bouncycastle.org/betas/ . I suggest the 
following change to bitcoinj (ECKey.java):

        X9ECParameters params = CustomNamedCurves.getByName("secp256k1");
+       FixedPointUtil.precompute(params.getG(), 8);

This will have the effect of using a width of 8 for fixed-point-comb 
multiplications, instead of the default width of 5. The comb uses a cache of 
pre-computed points to greatly speed up point calculations, and is applicable 
whenever the same point is multiplied by many different scalars, as is 
particularly the case for the curve generator point.

Memory usage is approximately (150 << width) bytes. So <5kB at w=5, <40kB at 
w=8. I think a width as high as 12 could be reasonable (640kB), but there are 
diminishing returns of course. Perhaps there could be a configuration property 
for it.

ECDSA key generation and/or signing should give a direct measure of the 
effects. Here are some rough results of a 10000 signature test (w=width, 
ms=milliseconds):

 w | ms
---+-----
 5 | 1755
 6 | 1640
 7 | 1502
 8 | 1403
 9 | 1333
10 | 1286
11 | 1257
12 | 1176

Original comment by peter.de...@gmail.com on 11 Apr 2014 at 4:59

GoogleCodeExporter commented 9 years ago
I think 640kb should be enough for anyone ;)

Thanks. Will try this soon.

Original comment by mh.in.en...@gmail.com on 11 Apr 2014 at 5:49

GoogleCodeExporter commented 9 years ago
Interesting - seems like key generation gets faster the more we do it. I guess 
HotSpot went in and optimized here. This is really fast now though, I picked 
w=12:

02:47:32 1 DeterministicKeyChain.maybeLookAhead: Pre-generating 100 keys for 
M/0'/1
02:47:32 1 DeterministicKeyChain.maybeLookAhead: Took 121 msec
02:47:32 1 WalletTool.addKey: Setting keychain lookahead size to 2000
02:47:32 1 DeterministicKeyChain.maybeLookAhead: Pre-generating 1901 keys for 
M/0'/0
02:47:33 1 DeterministicKeyChain.maybeLookAhead: Took 934 msec

Nice! Now let's try it on Android - Andreas, could you post perf stats here 
when you get a chance to try it?

Peter - any idea when BC 1.51 will leave beta? I don't really want to merge 
this complete checkin to git master given that BC should hopefully release soon 
and SpongyCastle will catch up soon after.

Original comment by mh.in.en...@gmail.com on 15 Apr 2014 at 1:00

GoogleCodeExporter commented 9 years ago
It's 2.1s on a Kindle Fire and 5.3s on a Nokia X for generating 101 keys! 
That's a great improvement, down from ~1 minute for BC 1.47 if I remember 
right. Thanks a lot for optimizing this.

Original comment by andreas....@gmail.com on 15 Apr 2014 at 3:19

GoogleCodeExporter commented 9 years ago
Thanks for the feedback; I'm especially glad to hear the improvements carry 
over to the smaller devices (though perhaps a smaller 'w' is warranted there?).

I think the plan is to release 1.51 by the end of April, but it's not a hard 
date.

Original comment by peter.de...@gmail.com on 16 Apr 2014 at 5:43

GoogleCodeExporter commented 9 years ago
Actually, I now understand the release is shaping up for May 10th.

Original comment by peter.de...@gmail.com on 22 Apr 2014 at 12:50

GoogleCodeExporter commented 9 years ago
Done, thanks!

Original comment by mh.in.en...@gmail.com on 3 Aug 2014 at 7:08

GoogleCodeExporter commented 9 years ago
Yeah, sorry about the repeatedly slipped release; it's been busy! Anyway, the 
final release included improvements to EC validation that should be welcome for 
bitcoinj's security, as well as further performance improvements since my last 
report (somewhat offset by the validation costs).

I've been contributing some speedups to libsecp256k1 of late also, so the gap 
there may actually be widening!

Original comment by peter.de...@gmail.com on 5 Aug 2014 at 3:50