Cache get_decoder_class

thewhaleking commented 6 months ago

Currently, a large portion of time (29000+ calls for get_delegates) for RPC calls is taken up by decoding. Profiling this shows that the vast majority of this decoding time is actually caused by calls to the scalecodec.base.RuntimeConfiguration.get_decoder_class method.

Because of the fairly limited number of decoder classes, we should be able to cache this with functools.cache to see large speed improvements.

thewhaleking commented 6 months ago

Because the decoding is being done by a third-party library (scalecodec), we will have to monkey-patch in a functools.cache call like so:


import functools
from scalecodec import base as scalecodec_base
import bittensor as bt

original_get_decoder_class = scalecodec_base.RuntimeConfiguration.get_decoder_class

@functools.cache
def patched_get_decoder_class(self, type_string):
    return original_get_decoder_class(self, type_string)

scalecodec_base.RuntimeConfiguration.get_decoder_class = patched_get_decoder_class

sub = bt.subtensor("finney")
sub.get_delegates()

With this, we can see a reduction of calls (for get_delegates) of the get_decoder_class method from 94,542 to 332.

In real-world performance, we see that the entire execution time for this script improves by ~48% with the patch implemented. Note that this only involves running five times each, so results may vary with times of day, ping, etc.:

Original	Patched	Run
3.754099130630493	2.3359689712524414	0
4.5155346393585205	2.0710458755493164	1
4.193195104598999	2.0840811729431152	2
4.192205905914307	2.0497679710388184	3
3.9943289756774902	2.2274067401885986	4
4.129872751235962	2.153654146194458	average

I believe implementing this in the code base will drastically reduce overall decode time.

thewhaleking commented 5 months ago

@RomanCh-OT had some concerns about potential memory leak caused by caching, so I investigated.

Uncached

Cached

Given the way the functools.lru_cache works, we should never run into a situation where this would ever be a problem. The memory sizes are nearly identical when using the cache vs not doing so. Note the times being slower than normally stated in these images are due to my own network latency.

thewhaleking commented 5 months ago

Opened https://github.com/polkascan/py-scale-codec/pull/117 to add the caching ability and functionality to the scalecodec library.

opentensor / bittensor

Cache get_decoder_class #1833