Closed mustermeiszer closed 3 months ago
But after submitting a certain transaction the block production stopped.
What kind of transaction? Can you please link the code?
We think it is some recursion issue, but I would have thought that this would break the node differently... ^^
We think it is some recursion issue
Can you be more specific?
Looking at your code, you don't deposit any digest item anywhere? Ahh, but you are using pallet-evm
that is putting a digest into the header AFAIR?
Looking at your code, you don't deposit any digest item anywhere? Ahh, but you are using pallet-evm that is putting a digest into the header AFAIR?
That is correct. We are using that. The weird thing is, that it fails, even when no evm transactions are present...
Can you be more specific?
Not really. My current approach is to comment out certain code-paths, restart the chain, submit and see if it hangs.
Hmm, really weird. Looking over the code, I don't really get what is going on. It looks good. Especially weird that this is triggered by a transaction.
If you comment out this line it works?
If you comment out [this line](https://github.com/centrifuge/centrifuge- chain/blob/4363f8bc3afd0ed04ff562587ef069de3fb7eb4e/pallets/oracle-feed/src/lib.rs#L128) it works?
Yes. It is also just happening on the relay-chain. The parachain happily produces the block without problems.
The parachain happily produces the block without problems.
The block author is not re-running its own block. Do other parachain nodes are able to import the block?
Will they try without the relay-chain approving it? Haven't checked yet. Can try in the dev environments. Any logs to log out for?
I boiled it down to this code-path. Works without it, fails with it. It is rather a portion that is running there.
So if you comment out the withdraw
it works?
Will they try without the relay-chain approving it? Haven't checked yet. Can try in the dev environments. Any logs to log out for?
With Aura yes, the other nodes should import it. Just check if the other nodes manage to import the block.
How can I reproduce this? Can you tell me exactly what to launch etc? Then I could also look at it.
So if you comment out the withdraw it works?
No, just the fee.value::<Self>()
and replace it with a BalanceOf::<T>::default()
.
How can I reproduce this? Can you tell me exactly what to launch etc? Then I could also look at it.
Give me 5 minutes I will adapt the issue for reproduction. Need a new branch as our launch script is out to date.
@bkchr updated. The above will build our node locally. I then simply comment out certain lines and re-do the procedure. We have integration test for that code-path, so it is really weird.
No, just the fee.value::
() and replace it with a BalanceOf:: ::default().
I have to correct myself. The withdraw
seems to be the real issue. Looking at the implementation it is short-circuiting for 0
values, which is provided by the BalanceOf::<T>::default()
. Providing non-zero values, results in the errror.
In the non-local testnetworks the behaviour is a bit different.
At least this is my current understanding. Here are the log files. Both as csv and as json. The CSVs side by side can be seen here
dev-centrifuge-collator.csv dev-centrifuge-collator.json dev-centrifuge-fullnode.csv dev-centrifuge-fullnode.json dev-polkadot-validator-0.csv dev-polkadot-validator-0.json dev-polkadot-validator-1.csv dev-polkadot-validator-1.json
@mustermeiszer just to keep you updated. I could reproduce this locally with your instructions. I already found out that it is related to the frontier digest. Looks like the state root is different. I'm continuing to find out on what is going on.
Thanks for the update!! Let me know if you need anything else.
https://github.com/centrifuge/centrifuge-chain/pull/1881 the fix for your issue.
pub const fn size_of_feed<T: Config>() -> u32 {
sp_std::mem::size_of::<(T::OracleKey, T::OracleValue, MomentOf<T>)>() as u32
}
The function is using mem::size_of
to calculate the size of some types. The problem is that the size is different in native
versus wasm
:
native
: 96bytes
wasm
: 88bytes
The problem with Centrifuge
was now that the block production was running in native
while the validation
is running in wasm
. The validation was finally failing because the storage root calculated by Frontier
was different than the one calculated at block production. As Frontier
is putting the ETH block hash into a digest, the validation failed at comparing the digests.
CC @mustermeiszer
The validation was finally failing because the storage root calculated by Frontier was different than the one calculated at block production. As Frontier is putting the ETH block hash into a digest, the validation failed at comparing the digests.
Does that mean, that without Frontier
the block production would be able to work just fine? How would that be possible - shouldn't the storage root then still be different between PoV and what the relay computes?
Does that mean, that without
Frontier
the block production would be able to work just fine?
No. Then it would have failed at comparing the storage_root
. However, as the digests are compared first, it failed there first.
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Description of bug
Testnet chains are halting and not producing any blocks anymore.
We recently upgrade the development environments to
release-polkadot-v1.7.2
. Asyncronous backing is NOT activated. At first the chains were producing blocks just fine. But after submitting a certain transaction the block production stopped.Relevant Logs
Relay-chain
Collator