Open franck44 opened 1 week ago
@musitdev
Here is a summary of our discussion on this issue:
[!IMPORTANT] So perhaps it needs some update in the UI so that the user can be informed on how long they have to wait.
To implement this solution we need to retrieve the status of a block (finalised or not).
on the Eth side,
The call
eth_getBlockByNumber
withfinalized
tag allow to get the last finalized block.
on the Mvnt side:
we open a Rest API port that only return finalized block with the same API. We just have to request block in finalized state.
The locations of the changes in the code base are:
Agree with that. Don't think that considering Safe blocks as an option though. This is going to delay us by quite a bit in terms of development and it completely changes the expected behavior and business should weight in.
Safe blocks are +- 25 blocks away from current block (5 minutes) Finalized blocks are +- 65 blocks away from current block (13 minutes)
Using either safe blocks or finalized blocks won't make much of a difference in terms of UX.
If we do not have time to implement this in time, believe we want to increase the number of block.confirmations to 3 instead of 1.
Don't think that considering Safe blocks as an option though. This is going to delay us by quite a bit in terms of development and it completely changes the expected behavior and business should weight in.
if we can implement block.confirmations=3 we can do block.confirmations=32 .. This does not return the finalization level but it would be approximatelly correct
In blockchain security has priority thus the focus should be on safety (high confirmation number initially) and optimise UX progressively (decrease confirmation number thereafter) if we feel that it safe to do so - rather than the other way around. After all for fees we approaching it the same way (start conservative and then improve)
tbh this is Ethereum. Everyone is aware that finality on Eth takes O(10min).
@apenzk The issue with 32 blocks is that we have to make significant changes to the UI and UX, how transactions are managed in the frontend. Current design considers a single transaction in a single go and then you can complete it on transaction history. We might need some design components and be a behind schedule. Changing to 3 confirmations keeps the same UI/UX. There's where I'm worried we might be late. I agree that it has to be done though.
From here https://etherscan.io/blocks_forked
It looks like a block is forked, that is, dropped from the blockchain, on average about 10 - 25 times per day.
I set the view to 100 rows and looked at several pages. There were only reorgs of depth 1. Double block reorgs may happen sometimes, but they don't appear to happen often.
So if we know we can expect that many blocks dropped per day, then we can impose certain limits for the vast majority of bridge users, to minimize risk and loss while prioritizing good UX along with security.
We could also have a "white glove" or more private bridge designed for larger amounts, that requires the full 32 blocks.
The attack described in this issue would only result in loss of funds to the bridge. If the bridge:
then I think Movement Foundation could safely and predictably cover losses, even in the rare event of a many-block reorg, and we could still have a fast bridge.
Optimizing for good UX is uncommon in the crypto world, and as "Movement" I do feel that for brand alignment, products should try to work fast and smoothly. So when considering security first, I believe we can use economic and probability-oriented measures such as those listed above, to also achieve a fast, enjoyable bridging UX.
There may be other attacks that should be considered in the event of a reorg. For example:
When bridging from L2 to L1, the block containing a lockBridgeTransfer
or completeBridgeTransfer
transaction is dropped. Solution: a service is set up to monitor for dropped blocks and bridge transactions within them. If a faulty call is found, it is matched against an initiate_bridge_transfer
call on L2. After verifying the initiator is correct, they are refunded automatically on L2.
If a refund transaction is in a dropped block, the the service monitoring for reorgs will notice, and re-attempt the refund. This could be manually approved by an admin.
One factor to consider is whether there would be a token supply increase in the case of the user receiving their tokens on L2 without locking on L1. If the assets are minted on L2, which is the current model, then we would need to consider whether to burn corresponding assets on L1 or some other solution, to balance the supply according to whatever tokenomic model is in place.
So if we know we can expect that many blocks dropped per day, then ....
the chance for dropping cannot be known in advance. we cannot predict if many or none blocks are dropped.
I set the view to 100 rows and looked at several pages. There were only reorgs of depth 1. Double block reorgs may happen sometimes, but they don't appear to happen often.
We cannot rely on heuristics like this. If you have some report and study please refer to that, otherwise this kind of guesstimate sounds super unsafe.
The attack described in this issue would only result in loss of funds to the bridge. If the bridge:
- limits the size of transfers
i would not recommend this. it just incentivizes to split bridge transfers into smaller portions.
We could also have a "white glove" or more private bridge designed for larger amounts, that requires the full 32 blocks.
π could we start with the "larger amount" being 0 ?
- rate-limits users
just incentivizes to create new accounts
starts the bridge with some extra fee to cover reorg losses, and gradually lower the fee as Movement Foundation feels comfortable doing so
bridge losses would not come expectedly or regularly.. you could not have a loss for 2 months and then ...
Movement Foundation could safely and predictably cover losses,
you cannot safely no predictably cover such losses
Optimizing for good UX is uncommon in the crypto world, and as "Movement" I do feel that for brand alignment, products should try to work fast and smoothly.
yes but please not at the cost of safety
There may be other attacks that should be considered in the event of a reorg
yes. that is a good point. the relayer could check whether "completes" or "locks" were successful
One factor to consider is whether there would be a token supply increase in the case of the user receiving their tokens on L2 without locking on L1. If the assets are minted on L2, which is the current model, then we would need to consider whether to burn corresponding assets on L1 or some other solution, to balance the supply according to whatever tokenomic model is in place.
There is this MIP, which proposes to have a security fund in case of catastrophic failures.. but just to remind this is for catastrophic events. double spends should never occur. the security fund is proposed with the desire to NOT be used at all.
So if we know we can expect that many blocks dropped per day, then ....
the chance for dropping cannot be known in advance. we cannot predict if many or none blocks are dropped.
I set the view to 100 rows and looked at several pages. There were only reorgs of depth 1. Double block reorgs may happen sometimes, but they don't appear to happen often.
We cannot rely on heuristics like this. If you have some report and study please refer to that, otherwise this kind of guesstimate sounds super unsafe.
The attack described in this issue would only result in loss of funds to the bridge. If the bridge:
- limits the size of transfers
i would not recommend this. it just incentivizes to split bridge transfers into smaller portions.
We could also have a "white glove" or more private bridge designed for larger amounts, that requires the full 32 blocks.
π could we start with the "larger amount" being 0 ?
- rate-limits users
just incentivizes to create new accounts
starts the bridge with some extra fee to cover reorg losses, and gradually lower the fee as Movement Foundation feels comfortable doing so
bridge losses would not come expectedly or regularly.. you could not have a loss for 2 months and then ...
Movement Foundation could safely and predictably cover losses,
you cannot safely no predictably cover such losses
Optimizing for good UX is uncommon in the crypto world, and as "Movement" I do feel that for brand alignment, products should try to work fast and smoothly.
yes but please not at the cost of safety
There may be other attacks that should be considered in the event of a reorg
yes. that is a good point. the relayer could check whether "completes" or "locks" were successful
One factor to consider is whether there would be a token supply increase in the case of the user receiving their tokens on L2 without locking on L1. If the assets are minted on L2, which is the current model, then we would need to consider whether to burn corresponding assets on L1 or some other solution, to balance the supply according to whatever tokenomic model is in place.
There is this MIP, which proposes to have a security fund in case of catastrophic failures.. but just to remind this is for catastrophic events. double spends should never occur. the security fund is proposed with the desire to NOT be used at all.
If there's going to be an insistence on 32 blocks (over 6 minutes on average) per Eth transaction, for this bridge design, then that makes the bridge pretty much unusable in the context of our current UI. No one will wait 6 minutes for their L2 wallet to pop up and complete on L2. It's already some friction when it's fast, but if it's slow it just won't be used.
So in that case I would favor some attestor-based model or using LayerZero. I agree with @0xPrimata that it would be good for the business side to weigh in with company priorities for whether and how we want to continue rolling out this HTLC bridge model.
For testnet, I don't see why it would be so risky to try it with 3 confirmations, and again, if there's a max amount of value bridged per block, then I do think the amount of loss can be capped in a way that is manageable by the Movement Foundation. But again it depends on company priorities so I think it would be best to get input from business on it.
Realistically, there's no such thing as perfect safety; there's just risk management. I think it should be up to the Movement Foundation to determine what risk profile they're willing to tolerate.
If there's going to be an insistence on 32 blocks (over 6 minutes on average) per Eth transaction, for this bridge design, then that makes the bridge pretty much unusable in the context of our current UI. No one will wait 6 minutes for their L2 wallet to pop up and complete on L2. It's already some friction when it's fast, but if it's slow it just won't be used.
So IMHO, UX should not prevail and guide the backend design. The backend should be secure. On Optimism, it takes 1-3mins to bridge to Opt and a week to bridge back. On Arbitrum.) it takes 15-30mins to bridge to L2 and a week to bridge back. On zkSync Era , bridging is ~15mins (time to finalisation on Eth). On Linea it takes ~20mins.
[!IMPORTANT] If it takes ~15mins to bridge to Mvnt, we are on par with the main chains.
Realistically, there's no such thing as perfect safety; there's just risk management. I think it should be up to the Movement Foundation to determine what risk profile they're willing to tolerate.
Yes that's true, and our role is to provide data to make informed decision.
One strong point for using finalisation as a criterion for relaying is that it is stable. If our use confirmations (12, 32, 65) the security guarantees may change over time, depending on Ethereum upgrades. for example the introduction ob blobs introduced frequent re-orgs.
[!IMPORTANT] Finalisation is a stable criterion that is safe and does not change over time (the time to finalisation can change and become shorter in the future, that's what is expected).
If we want to be a safe chain (like Arbitrum, zkSync Era, Linea) it seems natural to opt for finalisation on L1. Otherwise, we can take a risk, but hopefully this risk is low (we need to quantify it).
If there's going to be an insistence on 32 blocks (over 6 minutes on average) per Eth transaction, for this bridge design, then that makes the bridge pretty much unusable in the context of our current UI. No one will wait 6 minutes for their L2 wallet to pop up and complete on L2. It's already some friction when it's fast, but if it's slow it just won't be used.
So IMHO, UX should not prevail and guide the backend design. The backend should be secure. On Optimism, it takes 1-3mins to bridge to Opt and a week to bridge back. On Arbitrum.) it takes 15-30mins to bridge to L2 and a week to bridge back. On zkSync Era , bridging is ~15mins (time to finalisation on Eth). On Linea it takes ~20mins.
Important
If it takes ~15mins to bridge to Mvnt, we are on par with the main chains.
Realistically, there's no such thing as perfect safety; there's just risk management. I think it should be up to the Movement Foundation to determine what risk profile they're willing to tolerate.
Yes that's true, and our role is to provide data to make informed decision.
One strong point for using finalisation as a criterion for relaying is that it is stable. If our use confirmations (12, 32, 65) the security guarantees may change over time, depending on Ethereum upgrades. for example the introduction ob blobs introduced frequent re-orgs.
Important
Finalisation is a stable criterion that is safe and does not change over time (the time to finalisation can change and become shorter in the future, that's what is expected).
If we want to be a safe chain (like Arbitrum, zkSync Era, Linea) it seems natural to opt for finalisation on L1. Otherwise, we can take a risk, but hopefully this risk is low (we need to quantify it).
Correct me if I'm wrong but it looks like from a quick read, Optimism may be only using 1 conf? (See their op-batcher
code: https://docs.optimism.io/builders/chain-operators/tutorials/create-l2-rollup). And I read elsewhere that they only use 1 conf... but haven't found conclusive proof.
Looking at https://docs.optimism.io/builders/chain-operators/configuration/proposer my first question would be what is this the number of confirmations for? In this link it says its the time for validators to react on the proposer transaction. So not sure if that relates to the bridge?
(Also in that link the default is 10)
Here more on the batcher https://specs.optimism.io/protocol/batcher.html
Looking at https://docs.optimism.io/builders/chain-operators/configuration/proposer my first question would be what is this the number of confirmations for? In this link it says its the time for validators to react on the proposer transaction. Which is fine but also would be unrelated to the bridge.
(In that link the default is 10)
Okay, 10 probably makes more sense if it's 1 - 3 minutes. I can try to dig deeper into their code if I get time.
Something to consider about the above bridges @franck44 mentions:
Those are not HTLC bridges. They do not require the user to sign a transaction on the L2, if I understand correctly. (I've only given each bridge a cursory glance so please do correct me if I'm wrong about that.)
If we were to have a design where the user is not required to sign on L2 to receive their funds, then I think it would be more reasonable to have many more confirmations tolerated by users.
Regarding the comment "UX should not prevail and guide the backend design", if there is definition of UX where it doesn't seem important to prioritize UX, then that should be formalized. Movement's messaging has espoused optimizing for UX, meaning user experience. And rightly so. From my understanding of Movement's priorities, the user experience must always be top priority, with security being included as part of user experience. I will defer to @rolandoesparza to help facilitate priorities in that regard.
From what I can tell, my point still stands regarding if we limit the financial amount of assets transferred on each block, then that could result in a manageable loss as a cost of doing business scenario for Movement Foundation. Fees can be adusted so that Movement Foundation is profitable regardless of refunds.
I'm in the process of trying to get historical Ethereum mainnet reorg data to establish economic models to iterate on.
Regarding how this impacts the UI, if we were to require say 32 confirmations, I guess one solution could be to just show a user's transfers and their states (pending, completed, refunded, etc) associated with each connected L1 and L2 wallet. So for example in the L1 -> L2 direction, instead of sitting and waiting for the L2 wallet to pop up, they can leave and come back too see whether they're ready to sign the "complete" transaction on L2. They can then click the "Complete" button to prompt the wallet to pop up and sign. Tagging @rolandoesparza @vpallegar as that might be a decent short-term UI fix.
This still doesn't solve the need to sponsor transactions on L2, though. And with Movement Foundation paying for refunds, 1. that could get very costly for Movement Foundation and 2. there's a risk of users forgetting to come back to the computer in a timely manner and finish the transfer. They can't complete on L2 on a different device than the one the initiated the transfer with, because the pre-image is stored locally. Maybe through some use of accounts that could change, like, users can log into their Movement account and have more of a multi-device experience, but that functionality is not built yet, nor is the mechanism to sponsor transactions.
If we could do 3 confirmations, then I think it would make sense to require users to sit at the screen and wait for their L2 wallet to pop up. But with 32 confirmations, there would need to be more of an async UI experience.
On a related note, I think the Simplified Bridge Design https://github.com/movementlabsxyz/MIP/pull/58/ is worth serious consideration because it would satisfy the finality asks in this issue, and it would remove the need for refunds and the need for users to have funds on L2 to cover the completion transaction fees.
To clarify: the simplified bridge design is a standard lock/mint bridge design, it is not new.
The previous confusion originated from RFC-40 (Atomic bridge) which used an atomic swap to design a bridge transaction.
In an atomic swap, there are two users, one on chain A, and one on chain B. They want to swap assets atomically and they don't trust each other. That's why there are:
You can use the atomic swap mechanism to implement a bridge transaction but a bridge transaction is fundamentally different: there is one user with two accounts, one on chain A and one on chain B. The user cannot accept the deal on chain A (lock their asset) and reject the deal on chain B (mint the equivalent representation of the asset on chain B).
I raised this point a few times "a bridge is not a swap".
Problem
Our bridge design relays event from Ethereum mainnet to our L2. These events are collected from logs, and logs are stored in blocks. A block is permanent (irreversible) when it finalised which takes on average ~15 minutes.
You may be interested in this post for a primer on Ethereum events. For re-orgs, Alchemy has an explainer.
We may dismiss the previous scenario arguing that re-orgs are infrequent. However, since EIP-4844, re-orgs are more frequent according to this analysis.
Unfortunately there is no clear results on how deep a re-org is, i.e. how many non-finalised blocks it can impact.
Proposal
The proposal is to protect against the attack described above.
Implementation
It looks like the current implementation of the relayer relies on block confirmations to relay events. A block $b$ is k-confirmed if k blocks have been produced and appended to $b$, i.e. there is a chain of k blocks that are children of $b$.
Some exchanges rely on a confirmation of 20 or 30. Etherscan displays the real status of blocks, "unfinalised/unfinalised(safe)/finalised".
If there is API to query the status of a block we may use it to verify that blocks we relay events from are finalised.
Validation
The implementation impacts the relayer and some new tests may be needed to validate the changes.