Think of more sophisticated algorithm than blocking XCM channel between sending chain and bridge hub

svyatonik commented 11 months ago

paritytech/parity-bridges-common#2312 has backported the pallet-xcm-bridge-hub router 1:1 from v1 version (#2294). But @bkontur has raised a valid concern here:https://github.com/paritytech/parity-bridges-common/pull/2313#discussion_r1281424022. E.g. if the same chain is bridging with two different consensus systems, is it fine to block both bridges if at least one lane in one of bridges is blocked. With single remote consensus, it is maybe fine, because logically they're using the single channel across the pipeline - it is a bridge specific that messages are delivered over multiple lanes. But if we are bridging with other consensus system, then both bridges will stall until we resolve a single lane issue. It doesn't looks good to me.

OTOH for the sending chain, the channel for all outbound messages is the same - it is the channel to the sibling/child bridge hub. So maybe it is fine?

This needs some more thinking, so I'm filing the separate issue.

svyatonik commented 11 months ago

Just thoughts: 1) so we need to implement logical XCM channels over single physical. Does it violate the "XCM messages are guaranteed to be delivered and interpreted accurately, in order"? I mean - if we'll be pausing Kusama -> Polkadot messages, but deliver Kusama -> Ethereum messages, is it fine? In other words, this "in order" guarantee works for a logical channel, or for a physical channel; 2) ideally, we'll want to support those logical channels directly in HRMP/DMP/UMP. I.e. in addition to Resume and Suspend signals, we'll add Resume(<some-id>) and Suspend(<some-id>) and use it when required; 3) other option is to send XCM messages back to the sending chain. But it has its own drawbacks, so I'd prefer to avoid that.

bkontur commented 11 months ago

well, yes, actually with HRMP which uses recipient/sender as ParaId we dont have a possibility to differ, and I dont know about coming pallet_message_queue,

e.g. for handling MessageDispatch we added BridgeInstance (which is pallet_bridge_message::Pallet instance id) here

our actual Bridges: ExporterFor configuration uses bridge location as parachain:

MultiLocation { parents: 1, X1(Parachain(1002))

so it seems, that we should add also target pallet instance for sending, like:

MultiLocation { parents: 1, X2(Parachain(1002), PalletInstance(53))

e.g. 53 is pallet_bridge_messages instance for Kusama->Polkadot bridging here

so e.g. for ethereum we will have different pallet:

MultiLocation { parents: 1, X2(Parachain(1002), PalletInstance(123))

but I dont know yet how pallet_message_queue works and if it works with MultiLocation instead of HRMP *recipient/sender** as ParaId?

svyatonik commented 11 months ago

@bkontur The problem is not in splitting messages stream into several substreams when they reach the Source.BH. It is that in the current dynamic fees implementation, any bridge (aka lane if we omit the Eth bridge) may suspend the channel with the Source.AH (sending chain), effectively halting all other bridges that are using the same XCMP channel:

                                            /---> (1) Pallet for EthereumBridge
                                           /
Source.AH <--- XCMP channel ---> Source.BH -----> (2) Lane with Polkadot.AssetHub
                                           \
                                            \---> (3) Lane with Polkadot.Collectives

So in the picture above e.g. if relayer with Polkadot.Collectives is not working or if Polkadot.Collectives has some other troubles, eventually the "lane with Polkadot.Collectives" will suspend the "XCMP channel" with Source.AH and messages to other bridges won't even be delivered to the Source.BH until "lane with Polkadot.Collectives" will return back to the operational mode. It doesn't matter who is the recipient for those messages - they are simply not physically delivered to the Source.BH.

I've been thinking we may live with that, because logically we may look at the channel between Source.AH and Target.AH as a single channel (with-Polkadot bridge), which splits at the end. So we can't actually do anything here. I even left the big comment here: https://github.com/paritytech/parity-bridges-common/blob/master/modules/xcm-bridge-hub-router/src/lib.rs#L134-L163

But if we talk about multiple remote consensus systems, this analogy doesn't work anymore. Why the with-Polkadot bridge should affect the with-Ethereum bridge. So we probably need to introduce some multiplexing protocol in v2 instead of physically suspending the channel between sending chain and its sibling BH.

bkontur commented 11 months ago

But if we talk about multiple remote consensus systems, this analogy doesn't work anymore. Why the with-Polkadot bridge should affect the with-Ethereum bridge. So we probably need to introduce some multiplexing protocol in v2 instead of physically suspending the channel between sending chain and its sibling BH.

I guess, thats what I was talking about, but probably wrong way :D

I was thinking about something like this:

if with-Polkadot bridge has congestion problem (what ever lane to Polkadot), then suspend all pallet_message_queue related to the Polkadot bridging pallet specified by MultiLocation { parents: 1, X2(Parachain(1002), PalletInstance(53))

if with-Ethereum bridge has congestion problem, then suspend all pallet_message_queue related to the Ethereum bridging pallet specified by MultiLocation { parents: 1, X2(Parachain(1002), PalletInstance(123))

but this assumes that between AssetHub and BridgeHub exists several independent queues:

pallet_message_queue for BridgeHub+with-Polkadot bridge
pallet_message_queue for BridgeHub+with-Ethereum bridge

instead of one XCMP channel like now

maybe I have wrong expectation, that new pallet_message_queue will replace existing XCMP channel (HRMP channel between two paraIds)

like to open dedicated channels:

AssetHub and MultiLocation { parents: 1, X2(Parachain(1002), PalletInstance(53))
AssetHub and MultiLocation { parents: 1, X2(Parachain(1002), PalletInstance(123))

I really need to check pallet_message_queue PR

svyatonik commented 11 months ago

Yes, the single HRMP channel is a problem

svyatonik commented 11 months ago

Ok, I'm proposing the following solutions.

For v2: 1) keep backpressure mechanism for the Target.BH -> Target.AH queue. So we still reject delivery transaction if Target.BH -> Target.AH is overloaded; 2) when we see many messages in the Source.BH -> Target.BH queue, we are sending XCM message to the Source.AH instead of physically suspending the channel Source.AH -> Source.BH; 3) the XCM would be Transact { call: XcmBridgeHubRouter::report_congestion(destination_location, is_congested: true) }. This call will be the part of the pallet-xcm-bridge-hub-router at Source.AH and it will only be callable by the bridge hub; 4) we change the pallet-xcm-bridge-hub-router to maintain multiple fee factors for every route. So e.g. fee for Source.AH -> Target.AH and Source.AH -> Target.Collectives will have the different fee for the same message; 5) when the Source.BH -> Target.BH queue becomes uncongested, we send similar message to the Source.AH: Transact { call: XcmBridgeHubRouter::report_congestion(destination_location, is_congested: false) }; 6) if we have the congestion report (via report_congestion), then instead of sending messages over HRMP channel immediately, we keep them in the local storage of the pallet-xcm-bridge-hub-router. Once we receive the report_congestion(is_congested: false) call, we can send all messages from the local storage to the Source.BH. I.e. - return to normal operational mode; 7) if we keep receiving messages from Source.AH at Source.BH after we have called report_congestion, we are closing the bridge.

For v1 - let's do 1-5. We can trust that the asset hub respect our rules.

svyatonik commented 11 months ago

Also - can we (easily) enable free execution of just some single call (report_congestion) at the asset hub? I don't want to add more accounts that need to be topped (in this case: sovereign account of bridge hub at asset hub)? For v1 at least it should be fine, because they are system parachains. For v2 we'll probably charge the SA of sending chain at BH when sending this message (or else: close the bridge).

serban300 commented 11 months ago

Overall the strategy sounds very good. Just 1 question:

7. if we keep receiving messages from Source.AH at Source.BH after we have called report_congestion, we are closing the bridge.

I don't think the Transact { call: XcmBridgeHubRouter::report_congestion(destination_location, is_congested: true) } will reach the Source.AH immediately. It takes at least 1 more block for the hrmp message to be executed by the Source. If there is some congestion or other problems, maybe it takes more. Can we know when the Source has received the Transact ?

bkontur commented 11 months ago

yes, exactly that I wanted to say, that we just need SA of BH on AH for Transact { call: XcmBridgeHubRouter::report_congestion(, but yes we could also allow UnpaidExecution for v1 maybe

I think 1-5 looks ok,

and yes, I prepared also the same question for 7.

bkontur commented 11 months ago

Can we know when the Source has received the Transact ?

xcm is fire and forget, but we could use ReportStatus or some xcm query feature

svyatonik commented 11 months ago

yes, exactly that I wanted to say, that we just need SA of BH on AH for Transact { call: XcmBridgeHubRouter::report_congestion(, but yes we could also allow UnpaidExecution for v1 maybe

@bkontur Ok, cool, thank you. Do you know if it is possible/easy to configure XCM in a way that only such XCM program will be executed for free? Or we need some new barrier for that?

I don't think the Transact { call: XcmBridgeHubRouter::report_congestion(destination_location, is_congested: true) } will reach the Source.AH immediately. It takes at least 1 more block for the hrmp message to be executed by the Source. If there is some congestion or other problems, maybe it takes more. Can we know when the Source has received the Transact ?

and yes, I prepared also the same question for 7.

@bkontur @serban300 I was thinking of some window before closing the bridge. We'll think details later

bkontur commented 11 months ago

@bkontur Ok, cool, thank you. Do you know if it is possible/easy to configure XCM in a way that only such XCM program will be executed for free? Or we need some new barrier for that?

afaik, we need new barrier, I can do that, on my way to asset transfer I created at least two hacky Barriers :D, so no problem :D :D or we can allow UnpaidExecution for all BridgeHub xcm calls, but temporray hacky barrier seem ok for v1

svyatonik commented 11 months ago

afaik, we need new barrier, I can do that, on my way to asset transfer I created at least two hacky Barriers :D, so no problem :D :D or we can allow UnpaidExecution for all BridgeHub xcm calls, but temporray hacky barrier seem ok for v1

No, it's fine - I'll make myself.Thought that maybe there's already some and you're our XCM magician, which knows about it

svyatonik commented 10 months ago

For v1 it is resolved by https://github.com/paritytech/parity-bridges-common/pull/2318 For v2 we'll have the https://github.com/paritytech/parity-bridges-common/issues/2391 So I'm closing the issue

paritytech / parity-bridges-common

Think of more sophisticated algorithm than blocking XCM channel between sending chain and bridge hub #2315