paritytech / cumulus

Write Parachains on Substrate
GNU General Public License v3.0
618 stars 378 forks source link

`parachain-template-node` does not work with `polkadot-v0.9.36` #2043

Closed RomarQ closed 1 year ago

RomarQ commented 1 year ago

I have been trying to update the parachain dependencies from v0.9.29 to v0.9.36, but unfortunately, I was not able to successfully do it. After a few failed attempts I decided to try the parachain-template-node, which seems to not be working as well.

After 8 blocks I reach the following condition: VRF verification failed: EquationFalse.

I have also tried version v0.9.33 without success.

Does anyone know the root of the issue?

I have tried tried it against a local rococo instance.

Logs:

2022-12-26 15:11:42.048 DEBUG tokio-runtime-worker runtime::mmr::offchain: [Relaychain] offchain db set: pos 55 parent_hash 0x7eccb7950d91f6aed8f1f89a2942bb8ffad9ae3947f873d03b308676e5c63476 key [12, 109, 109, 114, 55, 0, 0, 0, 0, 0, 0, 0, 126, 204, 183, 149, 13, 145, 246, 174, 216, 241, 248, 154, 41, 66, 187, 143, 250, 217, 174, 57, 71, 248, 115, 208, 59, 48, 134, 118, 229, 198, 52, 118]    
2022-12-26 15:11:42.048 DEBUG tokio-runtime-worker runtime::inclusion-inherent: [Relaychain] [enter_inner] parent_header=0x7eccb7950d91f6aed8f1f89a2942bb8ffad9ae3947f873d03b308676e5c63476 bitfields.len(): 4, backed_candidates.len(): 1, disputes.len(): 0    
2022-12-26 15:11:42.052 DEBUG tokio-runtime-worker runtime::system: [Relaychain] [30] 0 extrinsics, length: 1179 (normal 0%, op: 0%, mandatory 0%) / normal weight:Weight(ref_time: 0, proof_size: 0) (0%) op weight Weight(ref_time: 0, proof_size: 0) (0%) / mandatory weight Weight(ref_time: 14392096822, proof_size: 0) (0%)    
2022-12-26 15:11:42.055  INFO tokio-runtime-worker substrate: [Relaychain] ✨ Imported #30 (0x593a…6105)    
2022-12-26 15:11:42.056 DEBUG tokio-runtime-worker parachain::collator-protocol: [Relaychain] Removing relay parent because our view changed. relay_parent=0x7eccb7950d91f6aed8f1f89a2942bb8ffad9ae3947f873d03b308676e5c63476
2022-12-26 15:11:42.056 DEBUG tokio-runtime-worker parachain::collator-protocol: [Relaychain] Collation was requested. candidate_hash=0x912d7cd4aa5c84877ca55d783146f6f04e8bed85bdce9bc916b52d468cd2cdb3 pov_hash=0xcfa30f1c3fe44025f6ccb8161fee9c719663572965b1cea7cc6e1d7f64779157 traceID=192974244615332473342209119479849744112
2022-12-26 15:11:42.058 DEBUG tokio-runtime-worker parachain::availability-store: [Relaychain] Candidate backed candidate_hash=0x912d7cd4aa5c84877ca55d783146f6f04e8bed85bdce9bc916b52d468cd2cdb3 traceID=192974244615332473342209119479849744112
2022-12-26 15:11:42.059 DEBUG tokio-runtime-worker parachain::approval-voting: [Relaychain] Imported new block. block_number=30 block_hash=0x593a21c764a04bf1751daaf0e86cae0a19a7b344b686bd48a8ba44f0b4d96105 num_candidates=0
2022-12-26 15:11:42.059 DEBUG tokio-runtime-worker parachain::approval-distribution: [Relaychain] Got new blocks [(0x593a21c764a04bf1751daaf0e86cae0a19a7b344b686bd48a8ba44f0b4d96105, 30)]
2022-12-26 15:11:44.909 DEBUG tokio-runtime-worker parachain::chain-selection: [Relaychain] Prepared 0 stagnant entries for pruning up_to=1671977504 min_ts=0 max_ts=0
2022-12-26 15:11:44.915  INFO tokio-runtime-worker substrate: [Relaychain] 💤 Idle (4 peers), best: #30 (0x593a…6105), finalized #27 (0xae82…2fae), ⬇ 1.9kiB/s ⬆ 1.8kiB/s    
2022-12-26 15:11:44.922  INFO tokio-runtime-worker substrate: [Parachain] 💤 Idle (0 peers), best: #9 (0x8df1…8d29), finalized #8 (0x4bce…04d4), ⬇ 0 ⬆ 0    
2022-12-26 15:11:47.477 DEBUG tokio-runtime-worker parachain::approval-voting: [Relaychain] Block finalized block_hash=0x10b4d09077d8de9b94fb7a41cbe72ebeed6da2eeee5297a28c7344b3cf7e8e7b block_number=28
2022-12-26 15:11:47.477 DEBUG tokio-runtime-worker parachain::collator-protocol: [Relaychain] Removing relay parent because our view changed. relay_parent=0xbf9581cb09c67e63c55afad492724befa3f131e964613f28655a404a6bccc7b5
2022-12-26 15:11:48.044  WARN tokio-runtime-worker sync: [Relaychain] 💔 Verification failed for block 0xff74f3ac6cc368557297f06f0fdde5398693d342a796a30ae4d2152135d9fd2f received from peer: 12D3KooWPKzmmE2uYgF3z13xjpbFTp63g9dZFag8pG6MgnpSLF4S, "VRF verification failed: EquationFalse"    
2022-12-26 15:11:48.060  WARN tokio-runtime-worker sync: [Relaychain] 💔 Verification failed for block 0xff74f3ac6cc368557297f06f0fdde5398693d342a796a30ae4d2152135d9fd2f received from peer: 12D3KooWKM1HeSv61ryZwAiBTZnqmass5pYM48k9Z7obzhTbnphm, "VRF verification failed: EquationFalse"    
2022-12-26 15:11:48.064  WARN tokio-runtime-worker sync: [Relaychain] 💔 Verification failed for block 0xff74f3ac6cc368557297f06f0fdde5398693d342a796a30ae4d2152135d9fd2f received from peer: 12D3KooWQCkBm1BYtkHpocxCwMgR8yjitEeHGx8spzcDLGt2gkBm, "VRF verification failed: EquationFalse"    
2022-12-26 15:11:48.937 DEBUG tokio-runtime-worker parachain::collator-protocol: [Relaychain] Timeout hit, sent a connection request. Disconnected from all validators = true timeout=12s
2022-12-26 15:11:48.937 DEBUG tokio-runtime-worker parachain::validator-discovery: [Relaychain] New ConnectToValidators request peer_set=Collation requested=0 failed_to_resolve=0
2022-12-26 15:11:48.937 DEBUG tokio-runtime-worker parachain::validator-discovery: [Relaychain] New ConnectToValidators resolved request peer_set=Collation num_peers=0 removed=1
2022-12-26 15:11:48.937 DEBUG tokio-runtime-worker parachain::network-bridge-rx: [Relaychain] action="PeerDisconnected" peer_set=Collation peer=PeerId("12D3KooWKM1HeSv61ryZwAiBTZnqmass5pYM48k9Z7obzhTbnphm")
2022-12-26 15:11:49.909 DEBUG tokio-runtime-worker parachain::chain-selection: [Relaychain] Prepared 0 stagnant entries for pruning up_to=1671977509 min_ts=0 max_ts=0
2022-12-26 15:11:49.915  INFO tokio-runtime-worker substrate: [Relaychain] 💤 Idle (0 peers), best: #30 (0x593a…6105), finalized #28 (0x10b4…8e7b), ⬇ 5.0kiB/s ⬆ 2.3kiB/s    
2022-12-26 15:11:49.923  INFO tokio-runtime-worker substrate: [Parachain] 💤 Idle (0 peers), best: #9 (0x8df1…8d29), finalized #8 (0x4bce…04d4), ⬇ 0 ⬆ 0    
2022-12-26 15:11:53.048  WARN tokio-runtime-worker sync: [Relaychain] 💔 Verification failed for block 0xff74f3ac6cc368557297f06f0fdde5398693d342a796a30ae4d2152135d9fd2f received from peer: 12D3KooWPKzmmE2uYgF3z13xjpbFTp63g9dZFag8pG6MgnpSLF4S, "VRF verification failed: EquationFalse"    
2022-12-26 15:11:53.063  WARN tokio-runtime-worker sync: [Relaychain] 💔 Verification failed for block 0xff74f3ac6cc368557297f06f0fdde5398693d342a796a30ae4d2152135d9fd2f received from peer: 12D3KooWKM1HeSv61ryZwAiBTZnqmass5pYM48k9Z7obzhTbnphm, "VRF verification failed: EquationFalse"    
2022-12-26 15:11:54.909 DEBUG tokio-runtime-worker parachain::chain-selection: [Relaychain] Prepared 0 stagnant entries for pruning up_to=1671977514 min_ts=0 max_ts=0
bkchr commented 1 year ago

https://github.com/paritytech/substrate/issues/12752

We have fixed this some time ago 🤔🤔

Are you really sure that you are using 0.9.36 for the relay chain and the parachain?

bkchr commented 1 year ago

Cc @davxy

RomarQ commented 1 year ago

paritytech/substrate#12752

We have fixed this some time ago thinkingthinking

Are you really sure that you are using 0.9.36 for the relay chain and the parachain?

Yes, I downloaded the artifact from https://github.com/paritytech/polkadot/releases/tag/v0.9.36

davxy commented 1 year ago

Ok you're running polkadot 0.9.36 for your relay chain nodes. But the fault that you see comes from the relay chain node embedded within the parachain collator. You also have to update the relay chain dependencies in your cumulus project.

Try to checkout the tag v0.9.360 in cumulus

In your parachain-template/node/Cargo.toml you should see something like...

...
sc-consensus = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
...                                                                ^^^^^^^^^^^^
RomarQ commented 1 year ago

I am not using an embedded relay chain on the collator. The setup has 4 relay validators and it works well on v0.9.29.

The only differences between release-v0.9.360 and polkadot-v0.9.36 are the following:

image

davxy commented 1 year ago

The logs that you've pasted are coming from the collator binary.

The lines with [RelayChain] comes from the relay chain "embedded" node that ships with the collator. Is not something that you choose to run or not, is always there to gather informations from the relay chain.

Can you please print the lines in parachain-template/node/Cargo.toml? For example the one for sc-consensus

RomarQ commented 1 year ago

Got it, by embedded I was assuming something like substrate-node-template.

[dependencies]
clap = { version = "4.0.29", features = ["derive"] }
log = "0.4.17"
codec = { package = "parity-scale-codec", version = "3.0.0" }
serde = { version = "1.0.151", features = ["derive"] }
jsonrpsee = { version = "0.16.2", features = ["server"] }

# runtime stuff

# Substrate
frame-benchmarking = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
frame-benchmarking-cli = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
pallet-transaction-payment-rpc = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-basic-authorship = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-chain-spec = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-cli = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-client-api = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-consensus = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-executor = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-network = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-network-common = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-rpc = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-service = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-sysinfo = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-telemetry = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-tracing = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-transaction-pool = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sc-transaction-pool-api = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sp-api = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sp-block-builder = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sp-blockchain = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sp-consensus-aura = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sp-core = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sp-keystore = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sp-offchain = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sp-runtime = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sp-session = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sp-timestamp = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
sp-transaction-pool = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
substrate-frame-rpc-system = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
substrate-prometheus-endpoint = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
try-runtime-cli = { git = "https://github.com/paritytech/substrate", optional = true, branch = "polkadot-v0.9.36" }

# Polkadot
polkadot-cli = { git = "https://github.com/paritytech/polkadot", branch = "release-v0.9.36" }
polkadot-primitives = { git = "https://github.com/paritytech/polkadot", branch = "release-v0.9.36" }
polkadot-service = { git = "https://github.com/paritytech/polkadot", branch = "release-v0.9.36" }
xcm = { git = "https://github.com/paritytech/polkadot", default-features = false, branch = "release-v0.9.36" }

# Cumulus
cumulus-client-cli = { git = "https://github.com/paritytech/cumulus", branch = "polkadot-v0.9.36" }
cumulus-client-consensus-aura = { git = "https://github.com/paritytech/cumulus", branch = "polkadot-v0.9.36" }
cumulus-client-consensus-common = { git = "https://github.com/paritytech/cumulus", branch = "polkadot-v0.9.36" }
cumulus-client-network = { git = "https://github.com/paritytech/cumulus", branch = "polkadot-v0.9.36" }
cumulus-client-service = { git = "https://github.com/paritytech/cumulus", branch = "polkadot-v0.9.36" }
cumulus-primitives-core = { git = "https://github.com/paritytech/cumulus", branch = "polkadot-v0.9.36" }
cumulus-primitives-parachain-inherent = { git = "https://github.com/paritytech/cumulus", branch = "polkadot-v0.9.36" }
cumulus-relay-chain-inprocess-interface = { git = "https://github.com/paritytech/cumulus", branch = "polkadot-v0.9.36" }
cumulus-relay-chain-interface = { git = "https://github.com/paritytech/cumulus", branch = "polkadot-v0.9.36" }
cumulus-relay-chain-rpc-interface = { git = "https://github.com/paritytech/cumulus", branch = "polkadot-v0.9.36" }
cumulus-relay-chain-minimal-node = { git = "https://github.com/paritytech/cumulus", branch = "polkadot-v0.9.36" }

[build-dependencies]
substrate-build-script-utils = { git = "https://github.com/paritytech/substrate", branch = "polkadot-v0.9.36" }
RomarQ commented 1 year ago

You can use https://github.com/paritytech/cumulus/tree/polkadot-v0.9.36/parachain-template to reproduce the issue. It gives the same error.

davxy commented 1 year ago

I'm trying it right now. But for me works just fine.

Can you give me more details about your setup?

  1. Have you started your network from scratch?
  2. How you started the network?
davxy commented 1 year ago

Can you try to remove the DB from the collator and restart it from scratch?

In some of the old version there was a bug that ended up saving incorrectly some internal structures used by babe. Thus if it was already corrupted, it can't be recovered just by restarting the network with the new version, should be regenerated.

RomarQ commented 1 year ago

The issue only appears after block 9.

  1. Download the relay chain binary from https://github.com/paritytech/polkadot/releases/tag/v0.9.36
  2. Clone branch polkadot-v0.9.36 at https://github.com/paritytech/cumulus
  3. Compile it with cargo build --release -p parachain-template-node
  4. Then move all these binaries to a zombinet folder
  5. And spawn the network with ./zombienet-linux spawn config.toml

config.toml

[settings]
provider="native"

[relaychain]
default_command = "polkadot"
default_args = [ "-lparachain=debug" ]
chain = "rococo-local"

  [[relaychain.nodes]]
  name = "alice"
  validator = true
  command = "polkadot"
  ws_port = 9944

  [[relaychain.nodes]]
  name = "bob"
  validator = true
  command = "polkadot"

  [[relaychain.nodes]]
  name = "charlie"
  validator = true
  command = "polkadot"

  [[relaychain.nodes]]
  name = "dave"
  validator = true
  command = "polkadot"

[[parachains]]
id = 2001
chain = "local"
cumulus_based = true

  [parachains.collator]
  name = "some-collator"
  command = "parachain-template-node"
  ws_port = 8802
  rpc_port = 8082
  args = [ "-lparachain=debug,runtime=debug" ]
RomarQ commented 1 year ago

Can you try to remove the DB from the collator and restart it from scratch?

In some of the old version there was a bug that ended up saving incorrectly some internal structures used by babe. Thus if it was already corrupted, it can't be recovered just by restarting the network with the new version, should be regenerated.

The example above does a fresh deployment.

I will try to dig a bit more on my side, since you are saying that it works fine on your setup. Will try without zombinet.

davxy commented 1 year ago

Going to try your zombienet setup. Feedback soon

davxy commented 1 year ago

Done exactly the same steps you've described and works fine.

Please also double check that you are using the right binaries and that are updated.

In the config.toml that you've pasted you are using polkadot and parachain-template-node from a folder that should be in your PATH env var, maybe the binaries there are not the ones that you've just built? Just an hypothesis

The error that you are experiencing is 100% equal to the one we've fixed and doesn't trigger here. (i.e. was triggering when trying to import the [RelayChain] block 31, the one that signals that epoch 2 is starting - again, in the relay chain). That's why looks suspicious.

Please double check everything :-)

RomarQ commented 1 year ago

Did it produce block number 10?

davxy commented 1 year ago

yes

RomarQ commented 1 year ago

yes

Ok, I will have another look at my setup. Thanks.

davxy commented 1 year ago

https://pastebin.com/5fKE2Qj7

bkchr commented 1 year ago

@RomarQ just for posterity, could you tell us what the issue was on your side or what you have done to fix it?

RomarQ commented 1 year ago

I have not been able to fix it. My issue seems to come from the relay chain, it cannot produce more than 30 blocks when using rococo-local. Works fine when using polkadot-local and used to work fine with rococo-local on v0.9.29.

Later today I will have a better look at this and create an issue on polkadot repository if I am not able to make it work.

bkchr commented 1 year ago

it cannot produce more than 30 blocks when using rococo-local

With a parachain registered or without?

For rococo local we have a really short session time of 1 minute afair, but this shouldn't influence this 🤔

RomarQ commented 1 year ago

Even without.

I will paste the log in few minutes.

RomarQ commented 1 year ago

In the logs bellow I am using v0.9.33, but it is the same for v0.9.36.

alice.log bob.log

Error with block built on 0x6ee54ea32eab69cfd847b2ba65d73b0187e3f7eaf0e7d37533e928648e4d67ee: Import failed: Unexpected epoch change

The logs stop after block 30.

davxy commented 1 year ago

Strange that given the same zombienet config and same binaries you experience this fault and doesn't reproduce for me.

I saw you precendently posted a log, now removed, where there was a panic for what looked like an OOM error or something like that.

Can I ask your test machine specs? (Memory)

RomarQ commented 1 year ago

Strange that given the same zombienet config and same binaries you experience this fault and doesn't reproduce for me.

I saw you precendently posted a log, now removed, where there was a panic for what looked like an OOM error or something like that.

Can I ask your test machine specs? (Memory)

32GB of memory 16 CPU cores

davxy commented 1 year ago

Few more questions. Because reproduction is resolution 😀

  1. is that systematic? I mean, does it happens in every run?
  2. can you paste exactly the minimal zombienet config that triggers the fault?
  3. Does it triggers if you don't include the parachains in the zombienet network config? (I.e. if you only run relay chain) Thank you
RomarQ commented 1 year ago

Few more questions. Because reproduction is resolution grinning

  1. is that systematic? I mean, does it happens in every run?
  2. can you paste exactly the minimal zombienet config that triggers the fault?
  3. Does it triggers if you don't include the parachains in the zombienet network config? (I.e. if you only run relay chain) Thank you
  1. Yes, it is systematic.
  2. Sure
[settings]
provider="native"

[relaychain]
default_command = "polkadot"
default_args = [ "-lparachain=debug" ]
chain = "rococo-local"

  [[relaychain.nodes]]
  name = "alice"
  validator = true

  [[relaychain.nodes]]
  name = "bob"
  validator = true

[[parachains]]
id = 2001
chain = "local"
cumulus_based = true

  [parachains.collator]
  name = "some-collator"
  command = "parachain-template-node"
  ws_port = 8802
  rpc_port = 8082
  args = [ "-lparachain=debug,runtime=debug" ]
  1. Yes, the relay chain stops at block 30 if I do not include a parachain in zombienet config. When including a parachain, the parachain stops on block 9.

I tried running a v0.9.29 parachain on a v0.9.36 relaychain and got the following double free or corruption (out).

parachain.log relaychain.log

RomarQ commented 1 year ago

If it helps, you can find the linux binaries bellow:

https://files.visualtez.com/parachain-template-node https://files.visualtez.com/polkadot

bkchr commented 1 year ago

I tried running a v0.9.29 parachain on a v0.9.36 relaychain and got the following double free or corruption (out).

Where do you test this? On your local machine? Can you run some memtest? Or give some more context on this error, maybe some backtrace using gdb. However, I have the assumption that your memory is maybe faulty?

bkchr commented 1 year ago

One thing about the error before 30 blocks. 30 blocks should be the end of the third epoch in rococo-local.

davxy commented 1 year ago

@RomarQ I run the binaries that you've shared.

I can confirm that there is definitely something wrong with your parachain-template-node binary. That build generates the fault you've described.

Interestingly my locally generated build (that works) has exactly the same build hash as your:

./parachain-template-node --version
parachain-template-node 0.1.0-afe528af891

Is possible that the binary has been generated with some cached artifact from an old version that has not been updated/discarded for some reason by cargo? Can you try to completely remove the target dir and rebuild. Or alternatively that it contains some modifications with respect to the vanilla one that ships with cumulus?

I just throwing out some hypothesis to justify this different behavior between your bin and mine that are apparently from the same commit...

For example this should work out of the box.

The errors that I get (again only with your parachain-template-node binary) are of two types and are triggered on different runs (without any apparent link between each other)


First error kind

Something that messed up the expected validator id

2022-12-28 01:04:36.018  INFO tokio-runtime-worker substrate: [Relaychain] ✨ Imported #30 (0x9cce…54f7)    
2022-12-28 01:04:36.019 DEBUG tokio-runtime-worker parachain::collator-protocol: [Relaychain] Removing relay parent because our view changed. relay_parent=0x4c53c5e4e59394dd0214f6bc5b80c33b9ce38f1c3ed116aabc8566a9c8acab84
2022-12-28 01:04:36.019 DEBUG tokio-runtime-worker parachain::collator-protocol: [Relaychain] Collation was requested. candidate_hash=0x67e4bf19cdb3145e82d797316e4c79c3e1e76cda539517c49f44363abb82100a pov_hash=0xc9b90fae911a53a3590bab18e4ee246e59782ad7de30f4c55637fb928adc7443 traceID=138098203234191556859895022751469828547
2022-12-28 01:04:36.019 DEBUG tokio-runtime-worker parachain::availability-store: [Relaychain] Candidate backed candidate_hash=0x67e4bf19cdb3145e82d797316e4c79c3e1e76cda539517c49f44363abb82100a traceID=138098203234191556859895022751469828547
2022-12-28 01:04:36.020 DEBUG tokio-runtime-worker parachain::approval-voting: [Relaychain] Imported new block. block_number=30 block_hash=0x9cce4a52ec383090141c8baa4201526a03a74abb0fa6500ba09ad0e09b8d54f7 num_candidates=0
2022-12-28 01:04:36.020 DEBUG tokio-runtime-worker parachain::approval-distribution: [Relaychain] Got new blocks [(0x9cce4a52ec383090141c8baa4201526a03a74abb0fa6500ba09ad0e09b8d54f7, 30)]
2022-12-28 01:04:37.024 DEBUG tokio-runtime-worker parachain::chain-selection: [Relaychain] Prepared 0 stagnant entries for pruning up_to=1672095877 min_ts=0 max_ts=0
2022-12-28 01:04:37.024  INFO tokio-runtime-worker substrate: [Relaychain] 💤 Idle (2 peers), best: #30 (0x9cce…54f7), finalized #27 (0x0f71…d55f), ⬇ 1.1kiB/s ⬆ 0.7kiB/s    
2022-12-28 01:04:37.076  INFO tokio-runtime-worker substrate: [Parachain] 💤 Idle (0 peers), best: #9 (0x314a…be5e), finalized #8 (0x8ad6…c704), ⬇ 0 ⬆ 0    
2022-12-28 01:04:38.153 DEBUG tokio-runtime-worker parachain::approval-voting: [Relaychain] Block finalized block_hash=0x8239b9277ae5079b8c072d2e04392713760916ed2437320854ac65dd732bf80d block_number=28
2022-12-28 01:04:42.016  WARN tokio-runtime-worker sync: [Relaychain] 💔 Verification failed for block 0xd86e5413d03cfad084939c39dcfd6c73f1b788b8cd570aee2d758487f422c96a received from peer: 12D3KooWRkZhiRhsqmrQ28rt73K7V3aCBpqKrLGSXmZ99PTcTZby, "Invalid author: Expected secondary author: Public(d43593c715fdd31c61141abd04a99fd6822c8558854ccde39a5684e7a56da27d (5GrwvaEF...)), got: Public(8eaf04151687736326c9fea17e25fc5287613693c912909cb226aa4794f26a48 (5FHneW46...))."    
2022-12-28 01:04:42.021  WARN tokio-runtime-worker sync: [Relaychain] 💔 Verification failed for block 0xd86e5413d03cfad084939c39dcfd6c73f1b788b8cd570aee2d758487f422c96a received from peer: 12D3KooWQCkBm1BYtkHpocxCwMgR8yjitEeHGx8spzcDLGt2gkBm, "Invalid author: Expected secondary author: Public(d43593c715fdd31c61141abd04a99fd6822c8558854ccde39a5684e7a56da27d (5GrwvaEF...)), got: Public(8eaf04151687736326c9fea17e25fc5287613693c912909cb226aa4794f26a48 (5FHneW46...))."  
...

Second error type

Something that at some point mess with the memory

2022-12-28 01:00:06.024  INFO tokio-runtime-worker substrate: [Parachain] ✨ Imported #6 (0x660b…33d9)    
2022-12-28 01:00:06.024  INFO tokio-runtime-worker cumulus-collator: [Parachain] PoV size { header: 0.1787109375kb, extrinsics: 2.9423828125kb, storage_proof: 2.56640625kb }
2022-12-28 01:00:06.024  INFO tokio-runtime-worker cumulus-collator: [Parachain] Compressed PoV size: 5.029296875kb
2022-12-28 01:00:06.025  INFO tokio-runtime-worker cumulus-collator: [Parachain] Produced proof-of-validity candidate. block_hash=0x660b88310b58d30a469dff2df8ce3b3354033da99947104df2ec28b2ef4c33d9
2022-12-28 01:00:06.025 DEBUG tokio-runtime-worker parachain::collation-generation: [Relaychain] candidate is generated candidate_hash=0xb9fca18c9fa8caf635ec64f5e685d520ccd5b1bf52dfde2376d7672b69f00329 pov_hash=0x89a013b20af85cca7d52b8dec73eb31ec58fd1880a075c37a7c348cd9ffacce1 relay_parent=0x3ded6ef4f58681ef0119876355b556b417243a2a6d4bbb8caf78bd437480644d para_id=2001 traceID=247218914637861327524642806845722187040
2022-12-28 01:00:06.025 DEBUG tokio-runtime-worker parachain::collator-protocol: [Relaychain] Received session info session_index=2
2022-12-28 01:00:06.025 DEBUG tokio-runtime-worker parachain::collator-protocol: [Relaychain] Accepted collation, connecting to validators. para_id=2001 relay_parent=0x3ded…644d candidate_hash=0xb9fca18c9fa8caf635ec64f5e685d520ccd5b1bf52dfde2376d7672b69f00329 pov_hash=0x89a013b20af85cca7d52b8dec73eb31ec58fd1880a075c37a7c348cd9ffacce1 core=CoreIndex(0) current_validators=[Public(d43593c715fdd31c61141abd04a99fd6822c8558854ccde39a5684e7a56da27d (5GrwvaEF...))] traceID=247218914637861327524642806845722187040
2022-12-28 01:00:06.025 DEBUG tokio-runtime-worker parachain::collator-protocol: [Relaychain] Advertising collation. relay_parent=0x3ded6ef4f58681ef0119876355b556b417243a2a6d4bbb8caf78bd437480644d peer_id=12D3KooWQCkBm1BYtkHpocxCwMgR8yjitEeHGx8spzcDLGt2gkBm
2022-12-28 01:00:06.025 DEBUG tokio-runtime-worker parachain::validator-discovery: [Relaychain] New ConnectToValidators request peer_set=Collation requested=1 failed_to_resolve=0
2022-12-28 01:00:06.025 DEBUG tokio-runtime-worker parachain::validator-discovery: [Relaychain] New ConnectToValidators resolved request peer_set=Collation num_peers=1 removed=0

====================

Version: 0.1.0-afe528af891

   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0/library/alloc/src/boxed.rs:2032:9
      std::panicking::rust_panic_with_hook
             at /rustc/d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0/library/std/src/panicking.rs:692:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0/library/std/src/panicking.rs:577:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0/library/std/src/sys_common/backtrace.rs:137:18
   4: rust_begin_unwind
             at /rustc/d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0/library/std/src/panicking.rs:575:5
   5: core::panicking::panic_fmt
             at /rustc/d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0/library/core/src/panicking.rs:64:14
   6: alloc::raw_vec::capacity_overflow
             at /rustc/d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0/library/alloc/src/raw_vec.rs:518:5
   7: <alloc::vec::Vec<T,A> as core::clone::Clone>::clone
   8: <alloc::vec::Vec<T,A> as core::clone::Clone>::clone
   9: <alloc::vec::Vec<T,A> as core::clone::Clone>::clone
  10: <alloc::vec::Vec<T,A> as core::clone::Clone>::clone
  11: <sc_consensus_epochs::EpochChanges<Hash,Number,E> as core::clone::Clone>::clone
  12: <sc_consensus_babe::BabeBlockImport<Block,Client,Inner> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  13: <alloc::boxed::Box<dyn sc_consensus::block_import::BlockImport<B>+Transaction = Transaction+Error = sp_consensus::error::Error+core::marker::Send+core::marker::Sync> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
  14: futures_util::future::future::FutureExt::poll_unpin
  15: sc_consensus::import_queue::basic_queue::BlockImportWorker<B>::new::{{closure}}
  16: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
  17: <sc_service::task_manager::prometheus_future::PrometheusFuture<T> as core::future::future::Future>::poll
  18: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
  19: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  20: tokio::runtime::park::CachedParkThread::block_on
  21: tokio::runtime::handle::Handle::block_on
  22: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  23: tokio::runtime::task::core::Core<T,S>::poll
  24: tokio::runtime::task::harness::Harness<T,S>::poll
  25: tokio::runtime::blocking::pool::Inner::run
  26: std::sys_common::backtrace::__rust_begin_short_backtrace
  27: core::ops::function::FnOnce::call_once{{vtable.shim}}
  28: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0/library/alloc/src/boxed.rs:2000:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0/library/alloc/src/boxed.rs:2000:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0/library/std/src/sys/unix/thread.rs:108:17
  29: start_thread
             at ./nptl/./nptl/pthread_create.c:442:8
  30: __GI___clone3
             at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 'tokio-runtime-worker' panicked at 'capacity overflow', library/alloc/src/raw_vec.rs:518

This is a bug. Please report it at:

    https://github.com/paritytech/cumulus/issues/new

Again, my suggestion is try a clean build and also try with the binary that ships with our release.

RomarQ commented 1 year ago

The same thing happens on a fresh build. I had tried the polkadot-parachain binaries before and these work as expected.

RomarQ commented 1 year ago

I am using rustc 1.68.0-nightly (d0dc9efff 2022-12-18) if it helps. In which version are you?

bkchr commented 1 year ago

Something that at some point mess with the memory

This is very weird, it seems to have problems allocating on clone?

RomarQ commented 1 year ago

Just to give an update.

I have updated my rust version to rustc 1.68.0-nightly (92c1937a9 2022-12-27) and things seem to be working now. Such a weird bug :worried:

bkchr commented 1 year ago

Just to give an update.

I have updated my rust version to rustc 1.68.0-nightly (92c1937a9 2022-12-27) and things seem to be working now. Such a weird bug worried

Good to know! :) Yeah, sometimes the name "nightly" is correct :D