paritytech / parity-bridges-common

Collection of Useful Bridge Building Tools 🏗️
GNU General Public License v3.0
270 stars 132 forks source link

What's the diferent between `on_demand_headers` and `finality_pipeline` #1143

Closed fewensa closed 2 years ago

fewensa commented 2 years ago

I'm confused about there.

finality_pipeline

https://github.com/paritytech/parity-bridges-common/blob/master/relays/lib-substrate-relay/src/finality_pipeline.rs

on_demand_headers

https://github.com/paritytech/parity-bridges-common/blob/master/relays/lib-substrate-relay/src/on_demand_headers.rs

There run in a loop https://github.com/paritytech/parity-bridges-common/blob/6d0daa8975a57c23b6705a725b18bc4960547d42/relays/lib-substrate-relay/src/on_demand_headers.rs#L146 and finally will call finality_relay::run https://github.com/paritytech/parity-bridges-common/blob/6d0daa8975a57c23b6705a725b18bc4960547d42/relays/lib-substrate-relay/src/on_demand_headers.rs#L228-L242

MessageTarget (in messages relay) will update require_finalized_header https://github.com/paritytech/parity-bridges-common/blob/6d0daa8975a57c23b6705a725b18bc4960547d42/relays/lib-substrate-relay/src/messages_target.rs#L237-L241

But I did debug here, finality_relay::run just called once. https://github.com/paritytech/parity-bridges-common/blob/6d0daa8975a57c23b6705a725b18bc4960547d42/relays/lib-substrate-relay/src/on_demand_headers.rs#L229

So, why need this file, and why need update require_finalized_header, will this affect finality_relay?

In addition, what is the difference between finality_pipeline

svyatonik commented 2 years ago

Finality relay reads finalized headers from the source chain and submits them to the target chain. It does that if source_chain.best_finalized_header.number > target_chain.best_finalized_source_header.number. For connecting to the source/target chain it uses FinalitySource and FinalityTarget clients accordingly.

Normally when you start standalone finality relay (i.e. substrate-relay relay-headers bla-bla-bla), these 'clients' are returning actual finalized blocks from both chains. It means that every time finalized header appears at the source chain, it is submitted to the target chain with its finality proof. Verifying finality proof onchain isn't cheap - e.g. on Kusama/Polkadot there are ~1K validators, which means we'll need to verify ~2/3*1K signatures onchain.

So there's that on-demand relay, which only relay headers when other application (right now - messages relay) needs it. E.g. if message relay sees new message at source block N, then it'll asks on-demand relay to sync that header to the target chain. Once it is synced, message may be relayed too. So we are importing only required headers, meaning that the bridge operational cost is decreased.

Under the hood, it is implemented here. So if there are no required headers, finality source would return 0 as best finalized header number. Meaning that source_chain.best_finalized_header.number > target_chain.best_finalized_source_header.number is not satisfied and finality relay would do nothing. Once message relay reports needs some header, the maximal_header_number is set to this number and finality relay will sync this header.

In addition, what is the difference between finality_pipeline

This seems like incomplete question :) Could you, please, rephrase?

fewensa commented 2 years ago

Thanks for your explanation.

The type of required_header_number I overlooked. It's Mutex, https://github.com/paritytech/parity-bridges-common/blob/6d0daa8975a57c23b6705a725b18bc4960547d42/relays/client-substrate/src/finality_source.rs#L34-L35 So I'm confused about how to affect finality_relay. if the type is Mutex. this does work, and affects finality_relay https://github.com/paritytech/parity-bridges-common/blob/6d0daa8975a57c23b6705a725b18bc4960547d42/relays/lib-substrate-relay/src/on_demand_headers.rs#L336

In addition, what is the difference between finality_pipeline

About this question, I think I know that. https://github.com/paritytech/parity-bridges-common/blob/6d0daa8975a57c23b6705a725b18bc4960547d42/relays/lib-substrate-relay/src/finality_pipeline.rs#L157 In finality_pipeline the maximal_header_number is `None, so this is not an on-demand relay.

fewensa commented 2 years ago

Hi @svyatonik

About on_demand_headers I have some new questions.

Question 1:

https://github.com/paritytech/parity-bridges-common/blob/cf9abc101110b202675cce8226b2b96865195336/relays/bin-substrate/src/cli/relay_headers_and_messages.rs#L461-L462 https://github.com/paritytech/parity-bridges-common/blob/cf9abc101110b202675cce8226b2b96865195336/relays/bin-substrate/src/cli/relay_headers_and_messages.rs#L478-L479

Why do both parties here affect a headers-relay at the same time?

Doing so may update required_header at the same time https://github.com/paritytech/parity-bridges-common/blob/cf9abc101110b202675cce8226b2b96865195336/relays/messages/src/message_race_loop.rs#L320

Question 2:

https://github.com/paritytech/parity-bridges-common/blob/cf9abc101110b202675cce8226b2b96865195336/relays/lib-substrate-relay/src/on_demand_headers.rs#L285-L288

Maybe due to network reasons, messages-relay is executed before headers-relay, then updated required_header_number, Will never pass this condition again (This happened on our bridge ). I'm tried to add sleep to https://github.com/darwinia-network/bridger/blob/96cf8321235bf0a0ed4b69e01a6e95e9dca42d4b/task/task-pangolin-pangoro/src/service/relay.rs#L149 it works.

std::thread::sleep(std::time::Duration::from_secs(5));

So I’m more troubled by this, it’s so designed or there are other reasons?

svyatonik commented 2 years ago

Question 1:

Message relay serves lane in both directions - e.g. when you run substrate-relay relay-headers-and-messages --lane=00000000, it means that it'll deliver messages from chain1 to chain2 and in opposite direction. Plus even one-direction message lane requires two-directional channel - that's because source chain needs to know when message has been dispatched => even unidirectional relay will submit transactions to both source and target nodes => it'll need headers of source at target chain && header of target at source chain.

Question 2:

I'll need more time and some instructions to understand what's happening with your bridge. I didn't quite get it - "Maybe due to network reasons, messages-relay is executed before headers-relay" which relay are you starting - headers relay, messages relay or headers-and-messages relay?

fewensa commented 2 years ago

Question 2:

I'm hacked dependent code in the local registry. added log before line@281

    log::debug!(
        target: "bridge",
        "{} *1 scan range, \
        best_finalized_source_header_at_source: {:?}, \
        best_finalized_source_header_at_target: {:?}, \
        current_headers_difference: {:?}, \
        maximal_headers_difference: {:?}, \
        required_header_number: {:?}",
        C::NAME,
        best_finalized_source_header_at_source,
        best_finalized_source_header_at_target,
        current_headers_difference,
        maximal_headers_difference,
        required_header_number,
    );

The source chain (left) is Pangolin The target chain (right) is Pangoro

Because I just want test pangoro(right) -> pangolin(left), so I removed pangolin(left) -> pangoro(right) messages-relay (I don’t want to be affected by irrelevant logs) (And I'm also tested include this direction bridge).

After doing this, when I start bridger (start messages-relay). The logs output like this

[2021-09-20T06:47:06Z DEBUG substrate_relay_helper::on_demand_headers] Pangoro *1 scan range, best_finalized_source_header_at_source: 287918, best_finalized_source_header_at_target: 267035, current_headers_difference: 20883, maximal_headers_difference: 100, required_header_number: 0
[2021-09-20T06:47:06Z DEBUG substrate_relay_helper::on_demand_headers] Pangolin *1 scan range, best_finalized_source_header_at_source: 555021, best_finalized_source_header_at_target: 488448, current_headers_difference: 66573, maximal_headers_difference: 100, required_header_number: 555021

The first times: pangolin(source) -> pangoro(target) (Pangoro::Develivery) headers-relay required_header_number started from 0

pangoro(source) -> pangolin(target) (Pangolin::ReceivingConfirmationsDelivery) headers-relay required_header_number started from 555021 not 0. So I think the messages-relay is early started and updated required_header_number by query source chain state. then headers-relay will never pass this condition. only when best_finalized_source_header_at_target >= required_header_number will continue to work, and then this is difficult to do, it needs finality_loop to scan one by one and complete it synchronously.

I'm test add std::thread::sleep(std::time::Duration::from_secs(5)); to https://github.com/darwinia-network/bridger/blob/96cf8321235bf0a0ed4b69e01a6e95e9dca42d4b/task/task-pangolin-pangoro/src/service/relay.rs#L149

pangoro(source) -> pangolin(target) (Pangolin::ReceivingConfirmationsDelivery) headers-relay required_header_number will start from 0. and then messages-relay started. will update required_header_number to 555021 (But wait for the next loop to actually use this value, before that, will call find_mandatory_header_in_range https://github.com/darwinia-network/parity-bridges-common/blob/7be2b478b49bc8e87ddfe034076e9e80e2f126f6/relays/lib-substrate-relay/src/on_demand_headers.rs#L390-L412 ).

svyatonik commented 2 years ago

So I think the messages-relay is early started and updated required_header_number by query source chain state. then headers-relay will never pass this condition. only when best_finalized_source_header_at_target >= required_header_number will continue to work, and then this is difficult to do, it needs finality_loop to scan one by one and complete it synchronously.

In addition to syncing headers that messages relay needs, on-demand-relayer also awakes from time to time and syncs all mandatory headers. These are headers that are changing GRANDPA authority sets && bridges GRANDPA pallet cannot proceed without such headers. That's what all the code below https://github.com/darwinia-network/parity-bridges-common/blob/7be2b478b49bc8e87ddfe034076e9e80e2f126f6/relays/lib-substrate-relay/src/on_demand_headers.rs#L180 is about.

So in your case when the code reaches the mandatory_headers_scan_range, messages relay has already requested to sync some descendant of mandatory header => that's why mandatory_headers_scan_range will simply return None. If you add pause, it'll be executing first && it will select this header for submission. But then after pause, messages relay will require descendant of this mandatory header => it'll also be synced. So the outcome is the same - both headers will be synced.