monero-ecosystem / monero-python

A comprehensive Python module for handling Monero cryptocurrency
BSD 3-Clause "New" or "Revised" License
246 stars 80 forks source link

Poor performance of public_address() and public_view_key() #42

Closed moneroexamples closed 4 years ago

moneroexamples commented 5 years ago

I'm using your project (good work by the way!) to generate hundreds/thousands of addresses and get their public address and private keys. But their generation is, unexpectedly, very slow. Some basic tests in python using timeit on Intel Core i7-3820 @ 3.8GHz:

In [1]: from monero import seed

In [2]: s = seed.Seed()

In [3]: %timeit s.public_address()
1.08 s ± 9.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit s.public_view_key()
544 ms ± 9.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: %timeit s.public_spend_key()
540 ms ± 3.53 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Any reason for this poor performance and any way/fix to make it faster? Generation of thousands addresses and keys on regular basis with that speed (basically one every 1 to 2 seconds) is slow.

lalanza808 commented 5 years ago

Can't offer much insight in the speed of the specific generation, but can you tell me how are you generating the hundreds/thousands of addresses? A simple for loop?

moneroexamples commented 5 years ago

Hi. Very basic loop (for now):

    n = 1000
    for i in range(n):
        print(f"{i+1}/{n}")

        s = monero.seed.Seed()
        s_address = s.public_address(net_type)
        s_viewkey = s.public_view_key()

        # other processing of the generated address, viewkey, e.g.
        print(s_address, s_viewkey) 

The bottleneck are these two lines:

        s_address = s.public_address(net_type)
        s_viewkey = s.public_view_key()
lalanza808 commented 5 years ago

Even if the address generation took half the time, doing it hundreds or thousands of times per X will be time consuming without parallelism. This could be a good next step:

import concurrent.futures
from monero import seed
from time import time

# Times to loop
loop_count = 100

def generate_seed():
    s = seed.Seed()
    s_address = s.public_address()
    print(f"Generated new address: {s_address}")
    return s_address

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = []
        for i in range(loop_count):
            future = executor.submit(generate_seed)
            futures.append(future)

if __name__ == "__main__":
    t_start = time()
    main()
    t_end = time()
    t_total = t_end - t_start
    print(f"\n\n[+] {t_total} seconds elapsed generating {loop_count} addresses.")
$ python -V
Python 3.6.0
lalanza808 commented 5 years ago

Output of script:

... snip ...
[+] 30.121598958969116 seconds elapsed generating 100 addresses.
moneroexamples commented 5 years ago

Thanks. I will look into it. Nevertheless, it still would be intresting to know why it is so slow. If not now, then then in the long term, it would be valuable.

emesik commented 5 years ago

It's slow because neither our ed25519 implementation is optimal, nor handling of its outputs is.

It's pure Python and it was designed to serve the most common use cases in the first row, namely having a working wallet with full features, not optimized for horizontal scaling. For example, the keys are internally kept as string hex representations because it's human-readable. However, for computations it would be much faster to keep the int (or even a point representation for public keys) and not convert (hexlify/unhexlify) each time.

The best solution would be to make some optimizations in Python code but also switch over to a low-level implementation of ed25519 arithmetic, like libsodium or orlp's. However, that requires creating Python/C bindings and dealing with all new set of problems typical to low-level solutions (building on different platforms, falling back to pure Python when lib is unavaliable, etc.)

Any help from a person experienced in this kind of work would be very welcome.

moneroexamples commented 5 years ago

I looked into the code. I see the problem is that ed25519.public_from_secret_hex() takes 0.5 seconds, due to scalarmult taking 0.5 seconds (its a recursive function). At the moment I can't look into making it better, assuming there is a way of doing it better?. Could be good exercise for future.

But, I also found that ed25519.public_from_secret_hex() are execute twice, for no apparent reason. In this code example

 s = seed.Seed()
s.public_address()
s.public_spend_key()
s.public_view_key()

ed25519.public_from_secret_hex() is going to be executed 4 times, adding up to 2 seconds of execution time! The quick fix that I implemented for myself now, is to calculate it only when necessary and store the calculated value for future use. One could do this:

    def public_spend_key(self):
        if self.ed_pub_spend_key:       
            return self.ed_pub_spend_key    
        self.ed_pub_spend_key = ed25519.public_from_secret_hex(self.secret_spend_key())
        return self.ed_pub_spend_key    

    def public_view_key(self):
        if self.ed_pub_view_key:        
            return self.ed_pub_view_key     
        self.ed_pub_view_key =  ed25519.public_from_secret_hex(self.secret_view_key())
        return self.ed_pub_view_key 

where self.ed_pub_spend_key and self.ed_pub_view_key are initialized to None in the __init__. This will cut the execution number of ed25519.public_from_secret_hex to 2 in the code snippet from before, and reduce time by 50%.

emesik commented 5 years ago

Could you create a PR for that value caching? Just please prepend underscore to the attribute names (self._ed_*_key) so they won't tempt to be accessed directly. The existing tests should cover that code part already.

PS: I'm pretty sure that recursion in scalarmult can be unrolled. We should look at existing implementations and check how to do it, or just use one if possible.

moneroexamples commented 5 years ago

@emesik Sure. I will submit PR once ready.

Regarding scalarmult, I also checked how generation of addresses and keys perform on pypy3. Often its much faster then cpython when loops (recursions as well?) are involved. But it was worse. Guess is that some parts (pysha3) use C, and pypy does not perform well when C extensions are involved.

emesik commented 5 years ago

Thanks!

moneroexamples commented 5 years ago

No problem.

moneroexamples commented 5 years ago

Little updated.

Ended up using the address generator in multiprocessing.Pool to speed it up and also wrapped it in asyncio as all my code was asyncio. With these I could efficiently work with address generation.

As a side note. The issue originated from development of short asyncio program for stress-testing openmonero. The script utilizes monero-python and it is here is someone is interested in it: https://github.com/moneroexamples/openmonero/blob/devel/scripts/batch_import_test.py

sanderfoobar commented 5 years ago

I managed to go from seed->public address in 0.091552 seconds using wallet_api2 directly on a single thread and making sure major:minor lookahead was set to 1:1.

This is offtopic for monero-python however.

emesik commented 4 years ago

With 9aee6564e74862f041fa8baa1b7702f4d9cb3c70 I have changed the ed25519 implementation to one that cut the test suite running time by two orders of magnitude.

Integrating low-level C/C++ code from monero or libsodium could potentially cut the times further, however we still stay in pure Python world, so I stopped looking for other solutions ATM.