Closed xlc closed 4 years ago
This is not related to Grandpa/Finality.
Looks like your unsigned validation code is not deterministic.
Are you saying this panic is unrelated to grandpa? In that case I have two issues...
How is it possible to have non-determinasic code? The offchain worker code does use a random value from offchain worker API. But the unsigned transaction itself should be deterministic.
Also the panic is constantly happening on a sentry node in our cloud server. But I can't reproduce it with local setup...
The panic is on block import. That is not related to grandpa.
Is that only happening for one sentry? Or all?
I did not check your code and your validation function.
I only have log access to one sentry, which constantly have this issue. I managed to reproduce it locally once but not again. The validator connecting the troublesome sentry occasionally have this error as well.
I will ask for our validating partners to check their logs.
OK. With some dirty code modifying in cargo cache to insert some additional loggings, I found out 5 out of 8 had prevote and the threshold is 6. Now I know which 5 have voted and know where to check for issues.
It will be good if those information can be exposed somehow (logging or RPC)
Turn out to be some p2p issue. Two of the validators are synced, but their grandpa votes were never received by others. I extract the key out and run put it into a locally run validator and fixes the finality issue.
I think we need better diagnostic about grandpa. Will open another issue for that.
Still no idea about the panic.. Will investigate it bit more and post new issue when we have more solid question.
If you can not import a block, you can not vote for it.
https://telemetry.polkadot.io/#list/Acala%20Mandala%20TC2
Running latest Acala, which uses 29cee59229626644c549e36e7921fca42bed68da
I believe the bad signature tx is related to our offchain worker that sending unsigned transaction
https://github.com/AcalaNetwork/Acala/blob/83c13dc2843603958e63f3230645e66220ea7844/modules/cdp_engine/src/lib.rs#L358
This happens on both sentry node and validator nodes.
Log with
-l afg=trace
https://gist.github.com/xlc/6b52a0dc4686e704fe5b4f7783a6c83c
Related: https://github.com/AcalaNetwork/Acala/issues/135