Closed wwestgarth closed 2 years ago
This has just happened a second time: https://jenkins.ops.vega.xyz/blue/organizations/jenkins/common%2Fsystem-tests/detail/system-tests/3422/pipeline/
The latest on this is:
I am out of ideas. Other that hardcoding core to write its logs to a file in the home directory and circumventing VC's log collection entirely.
From planning today:
@jgsbennett @MuthuVega - have we seen data-node crash on the full runs during this sprint?
The full runs have been passing green, closing this issue for now
Problem encountered
A suspected data-node crash may have occurred on this test run: https://jenkins.ops.vega.xyz/blue/organizations/jenkins/common%2Fsystem-tests/detail/system-tests/3404/pipeline/
Any further details are limited because after the crash vegacapsule restarted the job, the the new core/data-node caught up and everything was fine. But the logs from the initial task were lost.
The only reason I know something went wrong with that node is because when grepping the tendermint logs we see:
Tendermint for
testnet-nodeset-full-2-full
started 2 hours after the other nodes, with6180
blocks in its block-store.I'm not expecting much hope on this one, but thought maybe running the event-file through the data-node on loop might show up some instability?
Update I've seen this three times now, and it always fall over in this test
test_funding_reward_accounts_oneoff
with an internal error when getting a party account balance:Observed behaviour
Data-node may or may have not crashed.
Expected behaviour
Data-node did not crash, maybe?
Automation
Link to automation and explanation on how to run it to reproduce the problem/bug
Evidence
Logs
If applicable, add logs and/or screenshots to help explain your problem.
Additional context
Add any other context about the problem here including; system version numbers, components affected.
Definition of Done
Before Merging
After Merging
Done
if there is NO requirement for new system-tests