Open zemahran opened 1 year ago
Could you provide a bit more context, like logs or example code to reproduce? The type Compact<u32>
is so common, so probably is the result of another problem.
Here's the error, when trying to retrieve a block from polkadot using Polkadot's relay chain mainnet:
Any ideas on how to surpass it, or even find an implementation of this decoder class elsewhere? Thanks for trying to help @arjanz 🙏
This is odd.. Retrieving a block from Polkadot is such a basic tasks, this shouldn't go wrong under normal circumstances. Can you tell a bit more about your setup (OS, Python version, etc).
Are you using threads or some other kind of async configuration? Because then it seems a like described in #246
I wasn't able to reproduce that scenario yet, but I'll run some more tests.
I'm also seeing the same error, along with this one:
'NoneType' object has no attribute 'portable_registry'
My use case is querying blocks from a Substrate based chain in parallel using threads. I use a concurrent.futures.ThreadPoolExecutor
to spin up 25 threads. Each thread creates its own instance of SubstrateInterface and loops through a range of block numbers calling get_block
for each one. Only a few threads show errors in each run, the others work normally.
I can try slimming my code down to a minimal example if that would be helpful.
What I noticed is that this only occurs in a fresh environment. Previously I did not have the issue running the same code using older package versions. I tested on both Python 3.10 and 3.11.
Here's the working package list:
base58==2.1.1
certifi==2022.9.24
cffi==1.15.1
charset-normalizer==2.1.1
cytoolz==0.12.0
ecdsa==0.18.0
eth-hash==0.3.3
eth-keys==0.4.0
eth-typing==3.2.0
eth-utils==2.0.0
idna==3.4
more-itertools==9.0.0
py-bip39-bindings==0.1.10
py-ed25519-zebra-bindings==1.0.1
py-sr25519-bindings==0.1.5
pycparser==2.21
pycryptodome==3.15.0
PyNaCl==1.5.0
requests==2.28.1
scalecodec==1.0.45
six==1.16.0
substrate-interface==1.3.2
toolz==0.12.0
urllib3==1.26.12
websocket-client==1.4.1
xxhash==3.1.0
Versus the fresh install that produces the error:
ansible==2.9.12
argcomplete==3.0.8
base58==2.1.1
certifi==2023.5.7
cffi==1.15.1
charset-normalizer==3.2.0
cytoolz==0.12.1
dotty-dict==1.3.1
ecdsa==0.18.0
eth-hash==0.5.2
eth-keys==0.4.0
eth-typing==3.4.0
eth-utils==2.2.0
halo==0.0.31
hid==1.0.5
hjson==3.1.0
idna==3.4
Jinja2==3.0.3
log-symbols==0.0.14
milc==1.6.6
more-itertools==9.1.0
multidict==6.0.4
netaddr==0.8.0
passlib==1.7.2
py-bip39-bindings==0.1.11
py-ed25519-zebra-bindings==1.0.1
py-sr25519-bindings==0.2.0
pycparser==2.21
pycryptodome==3.18.0
PyNaCl==1.5.0
qmk==1.1.2
requests==2.31.0
scalecodec==1.2.6
six==1.16.0
spinners==0.0.24
substrate-interface==1.7.3
toolz==0.12.0
urllib3==2.0.4
websocket-client==1.6.1
xxhash==3.2.0
yarl==1.8.2
The versions in my working set are more recent than those reported in #246, so that suggests there is not a common root cause that was introduced in the set of updates between my two package sets. That is, the issue reported there was already occurring in versions earlier than those that worked for me.
We are getting the same error, only while multi-threading.
It seems that downgrading substrate-interface
to 1.3.2
and scalecodec
to 1.0.45
didn't solve the problem.
I'm still seeing this issue with scalecodec==1.2.8
and substrate-interface==1.7.7
. Seems my hunch about a certain version combo working was just due to confusion or luck.
I have here a minimal script that reproduces the issue on the majority of runs. This is on Python 3.11.8
on Linux.
import threading, queue, time
from substrateinterface import SubstrateInterface
WORKERS = 25
BLOCKS = 10_000
def worker(block_queue):
substrate = SubstrateInterface(url="wss://rpc.polkadot.io")
while 1:
block_number = block_queue.get()
substrate.get_block(block_number=block_number)
# Do something fun with the block
block_queue.task_done()
substrate = SubstrateInterface(url="wss://rpc.polkadot.io")
head_number = substrate.get_block_header()['header']['number']
block_queue = queue.Queue()
for i in range(head_number - BLOCKS, head_number + 1):
block_queue.put(i)
threads = []
for i in range(WORKERS):
thread = threading.Thread(target=worker, args=[block_queue])
thread.start()
threads.append(thread)
while block_queue.qsize() > 10:
time.sleep(30)
print(block_queue.qsize(), 'blocks remaining,', len([t for t in threads if t.is_alive()]), 'threads alive')
block_queue.join()
In my experience, it's often only a single thread that has the error and always toward the beginning of the run. My workaround is to just let the thread(s) die and then spawn new ones to take their place. The replacement threads seem to never fail. So the failure condition appears to be something around the simultaneous launching of many threads in a fresh environment.
Hope this can help you to reproduce and get to the bottom of this one, @arjanz. Thanks!
Are there any updates regarding this error? Is there a way to overcome it/come around it for the timebeing?