Closed thewhaleking closed 5 months ago
Because the decoding is being done by a third-party library (scalecodec), we will have to monkey-patch in a functools.cache
call like so:
import functools
from scalecodec import base as scalecodec_base
import bittensor as bt
original_get_decoder_class = scalecodec_base.RuntimeConfiguration.get_decoder_class
@functools.cache
def patched_get_decoder_class(self, type_string):
return original_get_decoder_class(self, type_string)
scalecodec_base.RuntimeConfiguration.get_decoder_class = patched_get_decoder_class
sub = bt.subtensor("finney")
sub.get_delegates()
With this, we can see a reduction of calls (for get_delegates
) of the get_decoder_class
method from 94,542 to 332.
In real-world performance, we see that the entire execution time for this script improves by ~48% with the patch implemented. Note that this only involves running five times each, so results may vary with times of day, ping, etc.:
Original | Patched | Run |
---|---|---|
3.754099130630493 | 2.3359689712524414 | 0 |
4.5155346393585205 | 2.0710458755493164 | 1 |
4.193195104598999 | 2.0840811729431152 | 2 |
4.192205905914307 | 2.0497679710388184 | 3 |
3.9943289756774902 | 2.2274067401885986 | 4 |
4.129872751235962 | 2.153654146194458 | average |
I believe implementing this in the code base will drastically reduce overall decode time.
@RomanCh-OT had some concerns about potential memory leak caused by caching, so I investigated.
Uncached
Cached
Given the way the functools.lru_cache
works, we should never run into a situation where this would ever be a problem. The memory sizes are nearly identical when using the cache vs not doing so. Note the times being slower than normally stated in these images are due to my own network latency.
Opened https://github.com/polkascan/py-scale-codec/pull/117 to add the caching ability and functionality to the scalecodec library.
Currently, a large portion of time (29000+ calls for get_delegates) for RPC calls is taken up by decoding. Profiling this shows that the vast majority of this decoding time is actually caused by calls to the
scalecodec.base.RuntimeConfiguration.get_decoder_class
method.Because of the fairly limited number of decoder classes, we should be able to cache this with
functools.cache
to see large speed improvements.