Closed serban300 closed 1 year ago
A couple of questions (sorry - I'm yet to read everything in paritytech/parity-bridges-common#2443 carefully):
1) what exactly needs to be delivered back to the source chain? Is it just 1
bit of information (that we have had before) or is it some more complicated structure? If that's just one bit, we can revert removal and if that's something more complicated, then I suggest using regular messaging to report the error - i.e. on_fail: queue-error-report-message-to-the-same-lane-in-reverse-direction
2) very dumb question, just idea. When I read about XCM some time ago, I thought that this QueryResponse
is sent back automatically (if ReportError
is in the XCM program). IIUC on BridgeHubWococo
the problem is that we are not executing anything (i.e. are not using XCM executor). But can we e.g. use XCM execute that will call the XcmBlobMessageDispatch
itself. Like we already do on the source parachain - it executes some XCM that sends message (using XCMP) to sibling parachain. IIUC we can try to do similar thing here. Do you think it is possible?
- what exactly needs to be delivered back to the source chain? Is it just 1 bit of information (that we have had before) or is it some more complicated structure? If that's just one bit, we can revert removal and if that's something more complicated, then I suggest using regular messaging to report the error - i.e. on_fail: queue-error-report-message-to-the-same-lane-in-reverse-direction
Yes, I think we would only need 1 bit to report success/failure.
- very dumb question, just idea. When I read about XCM some time ago, I thought that this QueryResponse is sent back automatically (if ReportError is in the XCM program). IIUC on BridgeHubWococo the problem is that we are not executing anything (i.e. are not using XCM executor). But can we e.g. use XCM execute that will call the XcmBlobMessageDispatch itself. Like we already do on the source parachain - it executes some XCM that sends message (using XCMP) to sibling parachain. IIUC we can try to do similar thing here. Do you think it is possible?
In theory QueryResponse
would be sent back automatically if ReportError
is in the XCM program. But from what I understand it would be sent automatically, only if the message came from a sibling parachain, because it uses send. I think we would need export
to send it over the bridge, or some custom logic. We could probably call execute on the XCM message, but still the fact that ReportError
uses send
is a problem and probably we would need more customization for this. Also if we send it back on the bridge we might run in the same issue, namely that the dispatch might fail on the other side and the message might get lost.
That's a bummer :/ I thought it is all covered by XCM when I've removed that functionality. But it is kinda strange that this stuff (ReportError
+ QueryResponse
) only works within local consensus - maybe we could ask @KiChjang if that's by design or it isn't just implemented yet? Or maybe you have idea why this isn't implemented, @serban300 ?
That said, I'm not against reverting dispatch-result-removal, if you think we need it. However, please note that this will affect the weight of receive_message_delivery_proof
call, so please don't do anything complicated directly from there. Just iterate over results bitvec and save every failure to some other queue, where it'll be processed from the on_idle
/on_initialize
or something like that. Also: we need to impl refund in this case (like we'll be accounting worst case, which is reads_writes(1, 1)
or something like that, but at least some calls will be successful => zero weight).
This doesn't seem like it's a huge issue for me -- you can simply implement SendXcm
on a struct that sends the QueryResponse
over a bridge, prepending ExportMessage
as necessary, and assigning it as an XcmSender
to the XCM executor config. In other words, this is more of an implementation issue than a design problem.
Thanks! Then I wonder why do we need a separate ExportMessage
instruction at all (we could just use send
instead?) + why two traits (SendXcm
and ExportXcm
) instead of just one (if we could send message to other consensus using SendXcm
). But that's probably because I haven't explored XCM in details.
But I guess if it is fine to use send
to send message to other consensus system, then probably that's a way to go @serban300 ? Or is it much more complex than relaying dispatch results using bridge-specific stuff?
I hope I'm not missing something. I'm not very familiar with XCM, so take the following with a grain of salt, but:
In my opinion the biggest issue with ReportError
is that as far as I understand the delivery is not guaranteed, and
ReportError
inside ReportError
, and even if we could, it would be really hard to follow.If we only have a communication between 2 chains let's say A and B and A wants to send some assets to B, probably ReportError
is ok. If something fails on A, the assets will be trapped on A. If something fails on B, the assets will be trapped on B, and also ReportError
will send a QueryResponse
back, which might arrive or not, but still the assets will be trapped on B. As long as the assets are trapped somewhere we are ok-ish even if the claiming process is not automated..
But if we consider a longer distance scenario things get complicated. In our case Rockmine -> BridgeHub Rococo -> BridgeHubWococo -> Wockmint
for example, if something fails on BridgeHubWococo
, even if we execute the message, the assets won't be trapped there (because ReserveAssetDeposited
is not executed there anyway, it should be executed on Wockmint) and the QueryResponse
might not get back to Rockmine
in which case we might lose the message and the assets completely.
Instead if we relay the dispatch result to BridgeHub Rococo
, at least we have a stored proof that the message failed. The assets are still not trapped, but maybe as long as we hold in the storage the fact that the message failed, we can retry to send QueryResponse
s to Rockmine
until one succeeds ? Not sure if this is possible either, but seems safer to have something in the storage at least that can be linked with the original message.
Also relaying the dispatch status would be more efficient. We would only need to send 1 bit back with the receive_message_delivery_proof which we send anyway, and we would avoid the customizations needed to make ReportError
work over the bridge.
But not sure if this is the best solution. Still thinking of alternatives. Couldn't find something trivial yet. Also maybe I'm missing something. But anyway, @svyatonik could you also point me to the commit with the dispatch-result-removal
please, just in case we need to revert it.
if something fails on BridgeHubWococo, even if we execute the message, the assets won't be trapped there (because ReserveAssetDeposited is not executed there anyway, it should be executed on Wockmint) and the QueryResponse might not get back to Rockmine in which case we might lose the message and the assets completely.
Also probably it's highly unlikely, but we could get a decoding error on BridgeHubWococo
, in which case we wouldn't be able to read the message at all, and even if it was a ReportError
instruction inside, we wouldn't be able to execute it. But we would be able to send back dispatch_status: failed.
You maybe right - I have no much expertise in XCM as well. I just wanted to avoid reinventing the wheel here. Because I was expecting that everything XCM need from bridges is to deliver blobs (as it was requested last year) (c). Nothing about delivering "dispatch result" or anything like that. So if we need to get dispatch result bits back - let's do it.
That's the PR that has removed relaying dispatch result: https://github.com/paritytech/parity-bridges-common/pull/1660
You'll probably also need callbacks that are invoked when messages are sent (unlikely - you may do the same from our XcmBlobHaulerAdapter
) and confirmed: https://github.com/paritytech/parity-bridges-common/pull/1649
In my opinion the biggest issue with
ReportError
is that as far as I understand the delivery is not guaranteed
This is by design actually. XCM is designed to be Asynchronous, and so none of the XCM components assume that you can expect a timely delivery. The reason behind this design decision is exactly as you've discovered -- the complexity of designing a timely message delivery system really high, and likely not generalizable either, so by default XCM assumes asynchronicity. If you want to implement a retry mechanism, then you'll have to implement it on top of XCM.
Related to paritytech/parity-bridges-common#2443
I think we might need to report the message dispatch status from target chain to source chain.
For example for the flow
Rockmine -> BridgeHub Rococo -> BridgeHubWococo -> Wockmint
we might have an XCM message that reachedBridgeHubWococo
, but the dispatch failed. In this case the assets won't be trapped anywhere (more details in this comment) . I think it would be good if we could report the failed dispatch toBridgeHub Rococo
so thatBridgeHub Rococo
could then send aQueryResponse
back toRockmine
to somehow unlock the assets there.@svyatonik do you think this would make sense ?