sherlock-audit / 2023-10-aloe-judging

9 stars 6 forks source link

panprog - No handling of L2 sequencer down situation, which can lead to intentional bad debt creation and other malicious actions while sequencer is down or just after it becomes active again #55

Closed sherlock-admin2 closed 1 year ago

sherlock-admin2 commented 1 year ago

panprog

medium

No handling of L2 sequencer down situation, which can lead to intentional bad debt creation and other malicious actions while sequencer is down or just after it becomes active again

Summary

The protocol lists L2 networks (Arbitrum, Optimism and Base) as deployment chains. These L2 chains depend on sequencer for most of its functionality. When the sequencer is down for whatever reason, it's still possible to communicate with L2 networks via L1 contracts, but it's very inconvenient and only a few users really use it, meaning that it's unfair to let the protocol keep working as usual when sequencer is down as few users can cause damage to many users during this time (liquidators won't be active etc.). The protocol behavior upon sequencer going back online after a long downtime can also cause all kinds of problems, so it's advised to add grace period when L2 sequencer goes back online

There is functionality to pause some actions if pool manipulation is detected. But there is no functionality to also pause some actions if L2 sequencer is down or when it just came back online.

Vulnerability Detail

More detailed explanation about the details of L2 networks downtime and how to handle it can be read here: https://docs.chain.link/data-feeds/l2-sequencer-feeds

The following scenario can happen when L2 sequencer goes down:

  1. ETH price = 1000 USDT
  2. L2 sequencer goes down for several hours
  3. At the time L2 sequencer goes online again, ETH price = 900 USDT
  4. Once it comes back up online, uniswap pair average price is still 1000 USDT (and the last uniswap pair price is 1000 USDT a few hours ago, meaning volatility is 0 and average price will slowly go down from 1000 to 900 over 30 minutes)
  5. Malicious user can then borrow 1000 USDT and put 1.005 ETH as collateral, effectively selling ETH for 995 USDT, which is way above the current ETH price. Position will be healthy if 1.005 ETH is added to this account as uniswap position with the range [994..1000], it will then be treated as 1005 USDT for health check (since uniswap position will be taken at average price of 1000 instead of current 900).
  6. Later, when the average price goes down to 900 USDT, user's account will be liquidated for a bad debt of 100 USDT, causing loss of funds for the other protocol users.

There are also a lot of the other issues possible right after sequencer goes online. For example, it will be unprofitable for liquidators to liquidate accounts if they involve assets swap - liquidator will be forced to buy ETH for 1000 USDT instead of current price of 900 USDT, so liquidations mostly won't work during that time, leaving many unhealthy positions unliquidated.

The same scenario can also be slightly modified to borrow before the sequencer goes offline via direct L1 contract communication. Such user's transaction will be added to queue and will be added to L2 blockchain while sequencer is still offline. There might be even more severe scenarios when user acts via L1 while the sequencer is down.

Impact

If L2 sequencer goes offline for some time, malicious users can cause all kinds of problems during offline time and in the first minutes after sequencer goes back online (since the price will jump from the last time it is online, but the average price will not catch up until AVG_WINDOW time passes). These malicious actions will usually lead to creation of bad debt via different means.

There will also be non-malicious user problems, such as liquidators refusing to liquidate due to unprofitable asset swaps required for liquidaiton.

Code Snippet

There is a pause functionality, but it only allows to pause if pool manipulation is detected: https://github.com/sherlock-audit/2023-10-aloe/blob/main/aloe-ii/core/src/Factory.sol#L157-L164

But there is no protection of the protocol while L2 sequencer is down or just came back online.

Tool used

Manual Review

Recommendation

Consider disabling all borrower actions (liquidate, warn, modify) while sequencer is down. Additionally, disable liquidations and withdrawals in the first AVG_WINDOW seconds after sequencer came back online. Sequencer status and time since last status change can be checked in chainlink examples: https://docs.chain.link/data-feeds/l2-sequencer-feeds

sherlock-admin2 commented 1 year ago

1 comment(s) were left on this issue during the judging contest.

MohammedRizwan commented:

low severity

roguereddwarf commented 1 year ago

Escalate

The sequencer going offline is an issue on the infrastructure layer. This is not a smart contract issue.

By that logic, almost ALL smart contracts deployed on Arbitrum / Optimism would need to have a sequencer check with the reasoning that it's unfair for some users to still interact with the contracts and / or to see market conditions changed when the sequencer comes back online.

In the past we have only accepted sequencer issues in conjunction with oracle price feeds (-> Chainlink feeds) since in these cases all old prices become applied when the sequencer comes back up.

This issue is similar in kind to chain re-org issues which are also on the infrastructure level and are invalid. So this issue should not be a Medium (not because it's an invalid concern but because this is not an issue on the smart contract layer).

sherlock-admin2 commented 1 year ago

Escalate

The sequencer going offline is an issue on the infrastructure layer. This is not a smart contract issue.

By that logic, almost ALL smart contracts deployed on Arbitrum / Optimism would need to have a sequencer check with the reasoning that it's unfair for some users to still interact with the contracts and / or to see market conditions changed when the sequencer comes back online.

In the past we have only accepted sequencer issues in conjunction with oracle price feeds (-> Chainlink feeds) since in these cases all old prices become applied when the sequencer comes back up.

This issue is similar in kind to chain re-org issues which are also on the infrastructure level and are invalid. So this issue should not be a Medium (not because it's an invalid concern but because this is not an issue on the smart contract layer).

You've created a valid escalation!

To remove the escalation from consideration: Delete your comment.

You may delete or edit your escalation comment anytime before the 48-hour escalation window closes. After that, the escalation becomes final.

panprog commented 1 year ago

Escalate

I believe this is valid Medium.

I disagree that this is a problem of the infrastructure. It's smart contract that has to implement the measures to handle sequencer offline / coming back online time adequately. I don't think that Chainlink is so unique that issues with it are valid while similar issues with the other protocols are invalid. And yes, many Arbitrum protocols can be vulnerable as well from the sequencer downtime, but it has to be seen case-by-case, as not all protocols can have bad situations caused by sequencer downtime. And many protocols will benefit from grace period to avoid any potential problems from this.

I've provided several examples of the issues caused by going online after sequencer downtime, including things such as liquidations being unprofitable for liquidator until the average price comes back to current price, I can also name volatility becoming low since there will be no transactions during a long period of time and multiple other things. It's not simply "unfair to other users", it's real issue causing loss of funds which can be mitigated by adding a grace period.

sherlock-admin2 commented 1 year ago

Escalate

I believe this is valid Medium.

I disagree that this is a problem of the infrastructure. It's smart contract that has to implement the measures to handle sequencer offline / coming back online time adequately. I don't think that Chainlink is so unique that issues with it are valid while similar issues with the other protocols are invalid. And yes, many Arbitrum protocols can be vulnerable as well from the sequencer downtime, but it has to be seen case-by-case, as not all protocols can have bad situations caused by sequencer downtime. And many protocols will benefit from grace period to avoid any potential problems from this.

I've provided several examples of the issues caused by going online after sequencer downtime, including things such as liquidations being unprofitable for liquidator until the average price comes back to current price, I can also name volatility becoming low since there will be no transactions during a long period of time and multiple other things. It's not simply "unfair to other users", it's real issue causing loss of funds which can be mitigated by adding a grace period.

You've created a valid escalation!

To remove the escalation from consideration: Delete your comment.

You may delete or edit your escalation comment anytime before the 48-hour escalation window closes. After that, the escalation becomes final.

Banditx0x commented 1 year ago

Unsure if already taken into account but the above issue link seems to be talking about the same case in a different contest, where it was excluded ^ Same excluded issue in RealWagmi

roguereddwarf commented 1 year ago

@Banditx0x I did not take the issue from Real Wagmi into consideration when writing the escalation.

Aloe did not explicitly specify whether external admins are trusted or restricted. They simply answered "Uniswap cannot be paused.".

Based on the "Won't fix" label and my previous arguments that this is an issue based on L2 infrastructure, I will stick to the argument laid out in the escalation.

I think this is another close call.

@panprog I would like to respond to my thoughts regarding why a missing sequencer check is valid when there is an integration with a Chainlink Oracle. The Chainlink Oracle specifically requires the sequencer check for interacting with their contracts on L2.

cvetanovv commented 1 year ago

Whether it will be a valid Medium or Not has to be judged depending on how it is judged in the RealWagmi contest -https://github.com/sherlock-audit/2023-10-real-wagmi-judging/issues/77

panprog commented 1 year ago

@cvetanovv Yes, I agree. Given the new sherlock rule regarding the sequencer, this might be invalid, the same as it seems it will be in wagmi.

cvetanovv commented 1 year ago

Since the decision in the RealWagmi sequencer is to be trusted, I think the escalation should be accepted here and the report to be downgraded to Low

Trumpero commented 1 year ago

Planning to accept escalation of @roguereddwarf, reject escalation of @panprog, and mark this issue as invalid.

Czar102 commented 1 year ago

Please do not judge issues based on past judgements. The situation in RealWagmi was very different. The external admin was trusted there. In this contest, it's not the case. We have also updated docs in order to better define the approach with respect to this kind of issues. See the changelog.

@panprog @roguereddwarf could you let me know if this issue lies within the exception of the rule 20 here?

roguereddwarf commented 1 year ago

So we have the following conditions in rule 20 that might make this valid: 1) concerns network admin? - yes, concerns the sequencer 2) can be remedied by smart contract modification? I'd say some of the issues can be remedied by smart contract modifications, but not all of them. E.g. when the sequencer is down, some users may just get liquidated, no way around that. Also right now when the contract is not paused while the sequencer is offline, users can at least interact with the L1 contracts. @panprog says this:

When the sequencer is down for whatever reason, it's still possible to communicate with L2 networks via L1 contracts, but it's very inconvenient and only a few users really use it, meaning that it's unfair to let the protocol keep working as usual when sequencer is down as few users can cause damage to many users during this time (liquidators won't be active etc.).

I would argue that pausing the protocol completely is even worse since then even sophisticated users (which Borrowers should be) can't interact with the protocol anymore. 3) the protocol team considers external admins restricted and the considered network was explicitly mentioned in the contest README - the README simply states that "Uniswap cannot be paused", not giving a clear "TRUSTED" or "RESTRICTED" answer, the network was clearly mentioned though

The rule further states that if all of the above conditions apply, this "may be a valid medium".

On the balance of things I don't see how this moves the needle toward "valid" far enough based on the default rule being "invalid". This protocol is just by design prone to liveness issues. Sure you can try to make some modifications but that doesn't solve the underlying problem which is that prices may change over time and this conflicts with any downtime that there might be.

panprog commented 1 year ago

@Czar102 @roguereddwarf Let me break down more exact scenario of what bad can happen due to sequencer down and how it can be mitigated. The main issue is if the price moves significantly during the time sequencer is down. The new Sherlock rule specifies 7-day sequencer down time:

It should be assumed that any such network issues will be resolved within 7 days

The Aloe protocol assumes that accounts should be liquidated in 1 day if they become unhealthy. Since sequencer can be down for 7 days, I think it's obvious that the price can move during this time by more than nSigma, which can cause bad debt in some accounts. I agree with @roguereddwarf in that it's inevitable that existing accounts can go into bad debt and nothing can be done about it, but the damage is limited in this case. However, the problem is that if sequencer coming back online is not handled correctly by the smart contract, it's possible for malicious user to steal all protocol funds during the first minutes after sequencer is online (compared to just limited loss if grace period is added). The scenario is given in the issue description, but let me repeat it here with some more color:

  1. Current ETH price = $1000. Current IV generates probe prices for health check: a=$769, b=$1300, c=$1000
  2. Squencer goes down for 7 days
  3. During the time sequencer is down, ETH moves to $1500 (a 50% move).
  4. Immediately upon sequencer is active again: Uniswap pool current price = $1000. Last updated price (7 days ago) = $1000. Average price over last 30 minutes = $1000. IV = the same as before (no update() was called, so LastWrites remains the same), meaning probe prices are still $769 / $1300
  5. Malicious user immediately borrows all available ETH (if not enough, he can lend additional ETH which he will immediately borrow), exchanging it to USDC which he puts as collateral. For example, if protocol has 1 ETH available to borrow, he lends 9 ETH, borrows all 10 ETH available, exchanges to 15000 USDC at the current exchange price at the other venue, puts $13000 as collateral for the borrowed 1 ETH, which is just enough to be healthy.
  6. Wait until account is liquidated (possibly liquidating himself). It will probably be liquidated at the average price of $1500, i.e. 8.23 ETH of debt (after liquidation incentive) will be repaid in exchange for account's $13000. Account will remain with 0 assets and 1.77 ETH of bad debt.
  7. Immediately upon liquidation, malicious user withdraws 8.23 ETH (of the 9 ETH he has lent, because protocol doesn't have any more ETH available due to bad debt, so malicious user still has 0.77 lent he can't withdraw).

So in the end: user spent 9 ETH, got 2000 USDC + 8.23 ETH back, or 1155 USDC of pure profit (+0.77 ETH still at the lender in case ETH gets returned to protocol via liquidations or somehow else) at the expense of protocol's users (all protocol ETH balance is stolen due to bad debt). The numbers can be calculated according to how much ETH the protocol has, to steal all of it.

If the protocol adds a grace period when at least borrows are paused during the first 30 minutes after sequencer goes online (to let average price catch up real current price), then no malicious user can abuse the situation to steal protocol funds.

So I believe that this scenario descibes enough to move the needle to medium:

  1. All protocol (one side of Lender) funds can be stolen right after sequencer comes online.
  2. This can be fixed in smart contract (disable lending for 30 minutes after sequencer just came online)
  3. External admins are restricted (although yes, this is not fully clear, but it's still listed as restricted in QA, additionally mentioning uniswap governance)

Are the admins of the protocols your contracts integrate with (if any) TRUSTED or RESTRICTED?

Restricted. The protocol should be able to handle any changes enacted by Uniswap Governance (i.e., new fee tiers or changes to the protocol fee).

  1. L2 Networks are mentioned in readme

On what chains are the smart contracts going to be deployed?

mainnet, Arbitrum, Optimism, Base

So based on the latest rules changes, I believe this should be medium.

roguereddwarf commented 1 year ago

Thanks for the additional insight @panprog

Unfortunately regarding 3) I missed that section in my previous message:

Are the admins of the protocols your contracts integrate with (if any) TRUSTED or RESTRICTED?

Restricted. The protocol should be able to handle any changes enacted by Uniswap Governance (i.e., new fee tiers or changes to the protocol fee).

So indeed it is "Restricted". @Czar102 I think it would be worth making it clearer to protocols in the future that this entails the "sequencer" and not just other on-chain admins.

And @panprog this scenario is valid and as you said there are many such valid scenarios to be found here.

If this is implemented as you suggest:

If the protocol adds a grace period when at least borrows are paused during the first 30 minutes after sequencer goes online (to let average price catch up real current price), then no malicious user can abuse the situation to steal protocol funds.

Can't the attacker then interact with the L1 contracts directly shortly before the sequencer comes back up to exploit the same thing? And if so, we would need to prevent all interaction when the sequencer is down but then this leads to bad debt in roughly the same cases when your issue would have been exploitable.

Having thought more deeply about this though, I agree with @panprog that the exception probably applies here and it should be Medium.

panprog commented 1 year ago

@roguereddwarf Yes, I agree that it's also possible to do the same just before the sequencer is online, so I'd say borrowing should be paused if sequencer is offline and 30 minutes after it last came online. This should prevent the malicious behavior. Possibly there are also some other actions that better be paused during this period, like:

Liquidations should probably be allowed in this period. And maybe some other measures taken to improve the situation.

Czar102 commented 1 year ago

After some thinking, I don't think it is possible to remedy this as network liveness issues are not solvable here. For example, the contracts can accrue bad debt anyway. Or, the sequencer can censor any transactions touching the pool for a long time, and then submitting their own transaction. The sequencer uptime oracle won't detect it so this impact also can't be remedied.

Hence I think this issue should rather be invalid.

roguereddwarf commented 1 year ago
Or, the sequencer can censor any transactions touching the pool for a long time, and then submitting their own transaction. The sequencer uptime oracle won't detect it so this impact also can't be remedied.

This is indeed true.

As discussed with @panprog earlier, if we assume a non-malicious sequencer that just experiences downtime, it's definitely possible to make improvements such as to alleviate the situation for good-faith users and prevent bad actors from exploiting the situation.

If we take the broader "liveness issue" and include a "malicious sequencer" that can censor transactions and profit from it, then indeed there is nothing to do about it.

It really comes down to how the rules are to be interpreted.

panprog commented 1 year ago

@Czar102

After some thinking, I don't think it is possible to remedy this as network liveness issues are not solvable here. For example, the contracts can accrue bad debt anyway. Or, the sequencer can censor any transactions touching the pool for a long time, and then submitting their own transaction. The sequencer uptime oracle won't detect it so this impact also can't be remedied.

The problem is not that bad debt can accrue during sequencer down - that's indeed not solvable. The problem is that just after sequencer comes back online (or while it's still offline) malicious user can steal all protocol funds - this can be fixed easily via smart contract. All the other impacts of sequencer being down - are limited losses which can't be prevented, but this one can be prevented and is very serious (hacker has 100% probability of stealing all protocol funds if the circumstances are right).

I'm not sure why have you added that exception to the sequencer rule if you still consider it invalid? I agree that it's invalid if not for that added exception, but I think this issue falls into the exception perfectly and should be valid medium.

Czar102 commented 1 year ago

The idea of an exception was because sometimes in time-critical systems there is a way to force include transactions. And then e.g. allowing the aliases (on Arbitrum) to call the function is enough.

Checking an oracle for sequencer uptime doesn't work for many types of misbehavior (apart from downtime). And so if it's fixable in only 20% and there still are some trust assumptions concerning that external admin, then it should be assumed that they are trusted as a design choice. That was the main idea of the "fixable" statement in the exception.

panprog commented 1 year ago

I see. This is not at all clear from the exception description. I wonder if any issue can fall in the exception at all with such approach. With so limited exception I guess it doesn't make sense to examine and report any sequencer issues, maybe just remove the exception from the rules?

If that's your idea of that exception, that fixable = 100% fixable in all cases, not 90% fixable, then I agree this is invalid, because it doesn't fix all problems connected with sequencer downtime, just the most severe one. This is still kinda subjective, because all issues which can happen due to sequencer downtime are very limited in nature: they can cause damage, yes, but maybe 5-10% of protocol funds at most, problably smaller, while the one described here causes 100% damage. In my view, it is still worth fixing to avoid catastrophic damage even if the situation is quite unlikely.

Czar102 commented 1 year ago

Interestingly, I was writing the exception when thinking about those ideal scenarios and was thinking this is one of those. I agree that this exception overly complicates considerations. We will most likely change the approach so that we will explicitly ask about those risks in the readme.

Planning to mark this issue invalid and accept @roguereddwarf's escalation.

panprog commented 1 year ago

We will most likely change the approach so that we will explicitly ask about those risks in the readme.

Yes, agree, if the protocol specifies explicitly that it wants sequencer-related issues (as a separate question to external integration admins), then the exception can be removed from the rules and all sequencer-related issues should be considered in such case, which will simplify classification of such issues both for Watsons and for judges.

Planning to mark this issue invalid and accept @roguereddwarf's escalation.

Agree, based on discussion above.

Czar102 commented 1 year ago

Result: Invalid Unique

sherlock-admin2 commented 1 year ago

Escalations have been resolved successfully!

Escalation status: