Open zancas opened 4 years ago
My tests show an increase in intermittent failures in both Python3 ports, relative to the Python2 version (1% -> 10% ;; n =100).
A plausible explanation: The Racey Tests Hypothesis:
Observations reported by @mdr0id: The observed failure rate drops when the node is run on a well-resourced systems, and the test code contains constant-wait times in polling loops.
Rather than publish experimental results now, I will focus on understanding the architecture of operationid
producing and consuming, state-transitions. I hypothesize that these transitions map to asynchronous
transitions.
If anyone wishes to analyze/compare data, (perhaps because they are skeptical of the hypothesis that the faliures are due to inconsistencies within zcashd nodes) I'm happy to curate and publish. For now I assume that this model/hypothesis is understood and agreed upon.
It would be of value to also note if this was on the native host or within a container/VM environment.
From str4d via rocketchat:
Yes
async ops were introduced by us, because creating Sprout transactions was slow
They are only used for shielded transaction creation
They are not generic asynchronous logic within the node
It's solely a mechanism for implementing an asynchronous RPC method
Describe the issue
Several RPC tests intermittently fail including, at least:
Can you reliably reproduce the issue?
If so, please list the steps to reproduce below:
qa/pull-tester/rpc-tests.sh wallet_listnotes
This file, detect_race_run_log.txt shows the result of a loop across ~ 45 iterations of WalletListNotes
Expected behaviour
Each test should pass each time it is run.
Actual behaviour + errors
See the attached file above where
detect_race.py
fails in ~ 5/44 runs. Note: detect_race.py is justwallet_listnotes.py
with lines deleted.The version of Zcash you were using:
Machine specs:
GNU ld (GNU Binutils for Debian) 2.32.51.20190909
Any extra information that might be useful in the debugging process.
This development/build/test environment is a docker container
Do you have a backup of
~/.zcash
directory and/or take a VM snapshot?~/.zcash
directory might help make the problem reproducible. Please redact appropriately.