Closed GoogleCodeExporter closed 9 years ago
Excellent, thanks.
I will prepare a branch that uses regular Bouncy Castle 1.51 when I next get a
bit of spare time for this stuff. If necessary we can always do our own
SpongyCastle equivalent for the Android app, though on Android we don't do a
whole lot of EC math so it's not very sensitive (yet - that might change with
HD wallets).
Original comment by hearn@google.com
on 5 Jan 2014 at 2:25
Someone tried this and reported a 20% performance improvement. Is that the sort
of figure you're expecting to hear?
Original comment by hearn@google.com
on 30 Jan 2014 at 12:58
I would need to know more about the particular benchmark, of course. As I
mentioned above, the custom curves show a raw improvement vs 1.50 (when
measured over thousands of point multiplications) of at least %300 (i.e. 4
times the mults-per-second). Single-threaded on my i7-2640M @2.80GHz (Win7
x64), (custom) secp256k1 can do about 3300 random-point multiplications per
second, but this is in an intensive loop and I am not sure what to expect
embedded in an actual application.
If the %20 improvement is to on-the-whole application performance, that would
in my view be consistent with the current version of bitcoinj spending about
%25 of its time on EC calcs vs only ~%6 with the custom curve, but this is only
speculation unless we can agree on a benchmark that I can subject to profiling.
As I have no experience using bitcoinj I would greatly appreciate if anyone
could put forward such a benchmark.
Original comment by peter.de...@gmail.com
on 31 Jan 2014 at 1:41
The 20% came from generating new addresses, so random number then scalar mult.
But I didn't profile it myself. This came from someone else. I haven't had a
chance to try myself with sig checking yet.
Original comment by hearn@google.com
on 31 Jan 2014 at 9:41
The latest numbers for the custom secp256k1 have improved to around 4000 random
point-mults/second (same machine specs as above). Sqrt has been optimized,
greatly improving point-decompression speed (x~4). Additionally we now have a
fixed-point optimization which improves signing speed (x~2) in ECDSA, and some
improvements in verification speed vs 1.50 (x~1.5).
This is all available in the current beta at
http://downloads.bouncycastle.org/betas/ , which should be a drop-in
replacement for 1.50 once
https://code.google.com/p/bitcoinj/issues/detail?id=497 is resolved.
Original comment by peter.de...@gmail.com
on 24 Feb 2014 at 3:11
I have made a branch that uses BC 1.51 beta 10. I just imported the code and
sedded the import lines to be back to org.bouncycastle.*
This is in the bc151 branch.
I also rebased the hdw-alpha (hd wallets) branch onto bc151, creating a new
branch for that, hdw-alpha-bc151. What I see is about a 5x speedup when
creating new keys, vs 1.50 - nice work! I haven't tried signature checking yet,
but it sounds like this is in the realm of what is expected.
If you want a goal to stretch for, libsecp256k1 is capable of about 20,000
mults/sec on a 3.2Ghz core.
Original comment by mh.in.en...@gmail.com
on 8 Apr 2014 at 4:55
Good to see those sorts of numbers, Mike.
The latest performance for random mults with custom secp256k1 is now >6000/s on
the machine described above (thanks to implementing GLV technique). I am not
optimistic about closing the gap towards 20k/sec much further. We are presently
stuck with representing field values as int[], i.e. 8 32-bit words. This is (in
my current understanding) a limitation of Java, since we have no explicit
access to 64*64->128 multiplication, nor to vectorization, and I'm not sure how
much magic HotSpot can work on the generated code within these constraints. A
lot could be done with a few well-chosen primitives exposed by the JDK, so I
may have to take it up with them (though others have tried without success).
Probably more relevant to the key generation and signing cases is the
fixed-point optimization. We'll have a new beta tomorrow-ish which makes it
easy to override the default window width (5 for secp56k1) used in the
fixed-point multiplier (trading off some extra memory usage). Since you use
just the one curve, I think it's a no-brainer to increase the width used in
bitcoinj for the base-point. It will be a simple one-liner following the
CustomNamedCurves.getByName call; I'll post a patch when the beta is up.
Original comment by peter.de...@gmail.com
on 10 Apr 2014 at 5:37
BC 151b11 is now up at http://downloads.bouncycastle.org/betas/ . I suggest the
following change to bitcoinj (ECKey.java):
X9ECParameters params = CustomNamedCurves.getByName("secp256k1");
+ FixedPointUtil.precompute(params.getG(), 8);
This will have the effect of using a width of 8 for fixed-point-comb
multiplications, instead of the default width of 5. The comb uses a cache of
pre-computed points to greatly speed up point calculations, and is applicable
whenever the same point is multiplied by many different scalars, as is
particularly the case for the curve generator point.
Memory usage is approximately (150 << width) bytes. So <5kB at w=5, <40kB at
w=8. I think a width as high as 12 could be reasonable (640kB), but there are
diminishing returns of course. Perhaps there could be a configuration property
for it.
ECDSA key generation and/or signing should give a direct measure of the
effects. Here are some rough results of a 10000 signature test (w=width,
ms=milliseconds):
w | ms
---+-----
5 | 1755
6 | 1640
7 | 1502
8 | 1403
9 | 1333
10 | 1286
11 | 1257
12 | 1176
Original comment by peter.de...@gmail.com
on 11 Apr 2014 at 4:59
I think 640kb should be enough for anyone ;)
Thanks. Will try this soon.
Original comment by mh.in.en...@gmail.com
on 11 Apr 2014 at 5:49
Interesting - seems like key generation gets faster the more we do it. I guess
HotSpot went in and optimized here. This is really fast now though, I picked
w=12:
02:47:32 1 DeterministicKeyChain.maybeLookAhead: Pre-generating 100 keys for
M/0'/1
02:47:32 1 DeterministicKeyChain.maybeLookAhead: Took 121 msec
02:47:32 1 WalletTool.addKey: Setting keychain lookahead size to 2000
02:47:32 1 DeterministicKeyChain.maybeLookAhead: Pre-generating 1901 keys for
M/0'/0
02:47:33 1 DeterministicKeyChain.maybeLookAhead: Took 934 msec
Nice! Now let's try it on Android - Andreas, could you post perf stats here
when you get a chance to try it?
Peter - any idea when BC 1.51 will leave beta? I don't really want to merge
this complete checkin to git master given that BC should hopefully release soon
and SpongyCastle will catch up soon after.
Original comment by mh.in.en...@gmail.com
on 15 Apr 2014 at 1:00
It's 2.1s on a Kindle Fire and 5.3s on a Nokia X for generating 101 keys!
That's a great improvement, down from ~1 minute for BC 1.47 if I remember
right. Thanks a lot for optimizing this.
Original comment by andreas....@gmail.com
on 15 Apr 2014 at 3:19
Thanks for the feedback; I'm especially glad to hear the improvements carry
over to the smaller devices (though perhaps a smaller 'w' is warranted there?).
I think the plan is to release 1.51 by the end of April, but it's not a hard
date.
Original comment by peter.de...@gmail.com
on 16 Apr 2014 at 5:43
Actually, I now understand the release is shaping up for May 10th.
Original comment by peter.de...@gmail.com
on 22 Apr 2014 at 12:50
Done, thanks!
Original comment by mh.in.en...@gmail.com
on 3 Aug 2014 at 7:08
Yeah, sorry about the repeatedly slipped release; it's been busy! Anyway, the
final release included improvements to EC validation that should be welcome for
bitcoinj's security, as well as further performance improvements since my last
report (somewhat offset by the validation costs).
I've been contributing some speedups to libsecp256k1 of late also, so the gap
there may actually be widening!
Original comment by peter.de...@gmail.com
on 5 Aug 2014 at 3:50
Original issue reported on code.google.com by
peter.de...@gmail.com
on 5 Jan 2014 at 1:11