sherlock-audit / 2024-05-aleo-judging

0 stars 0 forks source link

dtheo - `bond_public` delegator balance check needs reordering #30

Closed sherlock-admin3 closed 1 week ago

sherlock-admin3 commented 2 weeks ago

dtheo

Invalid

bond_public delegator balance check needs reordering

Summary

bond_public's check that the calling account has adequate funds is needlessly far into the function's execution, opening up the Aleo network to more resource consumption than is needed to determine the validity of the call.

Vulnerability Detail

bond_public has a check to verify that the calling account does indeed have adequate funds to call the function. This check should be brought to the top of the function to make this function robust against DOS attacks. It is crucial to keep all verification checks as close to the entrypoint of a function as possible to prevent provers and the Aleo network from wasting CPU cycles and state reads/writes on invalid calls.

Impact

Critical pre-deployed code should be entirely resilient to DOS vectors. Having the Aleo network compute more invalid finalize transition code than is necessary opens it up to DOS. It is possible to get a transaction with an underfunded call like this past the speculate phase of a prover and into a block to be computed by the entire network due to the relatively minimal set of proof conditions required for bond_public.

This entire section of code should not be run until after the account check. It contains 4 state writes, 2 state reads, and multiple assertions, conditionals, and arithmetic operations that will ultimately be reverted and thrown out if the calling account is not sufficiently funded to make the call. The Aleo network should seize the transaction fee and reject the transaction before running any of this code:

// Check if the delegator is already bonded to the validator.
    contains bonded[r0] into r5;
    branch.eq r5 true to continue_bond_delegator;
    // {
        // Set the withdrawal address.
        // Note: This operation is only necessary on the first time for a staker entry, in order to initialize the value.
        set r2 into withdraw[r0];

        // Ensure that the validator is open to new stakers. By default, `is_open` is set to `true`.
        cast true 0u8 into r6 as committee_state;
        get.or_use committee[r1] r6 into r7;
        assert.eq r7.is_open true;
lookups
        // Get the number of delegators.
        get.or_use metadata[aleo1qgqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqanmpl0] 0u32 into r8;
        // Increment the number of bonded delegators by one.
        add r8 1u32 into r9;
        // Determine if the number of delegators is less than or equal to 100_000.
        lte r9 100_000u32 into r10;
        // Enforce that the number of delegators is less than or equal to 100_000.
        assert.eq r10 true;
        // Set the new number of delegators.
        set r9 into metadata[aleo1qgqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqanmpl0];
    // }

    position continue_bond_delegator;

    /* Bonded */

    // Construct the initial bond state.
    cast r1 0u64 into r11 as bond_state;
    // Get the bond state for the caller, or default to the initial bond state.
    get.or_use bonded[r0] r11 into r12;
    // Enforce the validator matches in the bond state.
    assert.eq r12.validator r1;

    // Increment the microcredits in the bond state.
    add r12.microcredits r3 into r13;

    // Determine if the amount is at least 10 thousand credits.
    gte r13 10_000_000_000u64 into r14;
    // Enforce the amount is at least 10 thousand credits.
    assert.eq r14 true;

    // Construct the updated bond state.
    cast r1 r13 into r15 as bond_state;

Code Snippet

See Impact and Recommendation Sections synthesizer/program/src/resources/credits.aleo#L376-L380

Tool used

Manual Review

Recommendation

A general rule of thumb is that all verification checks in permission/resource guarded functions should be as close to the entry point of the function as possible to prevent abuse.

Move the following snippet from the bottom of the function to the top of finalize function, right after inputs are read in:

function bond_public:
    ...
    ...
    ...
finalize bond_public:
    // Input the staker's address.
    input r0 as address.public;
    // Input the validator's address.
    input r1 as address.public;
    // Input the withdrawal address.
    input r2 as address.public;
    // Input the amount of microcredits to bond.
    input r3 as u64.public;

+    /* Account */
+
+    // Get the balance of the caller.
+    // If the account does not exist, this finalize scope will fail.
+    get account[r0] into r16;
+    // Decrement the balance of the caller.
+    sub r16 r3 into r17;

// Retrieve the withdrawal address for the staker.
    get.or_use withdraw[r0] r2 into r4;
    // Ensure that the withdrawal address is consistent.
    assert.eq r2 r4;

    // Check if the delegator is already bonded to the validator.
    contains bonded[r0] into r5;  R 
    branch.eq r5 true to continue_bond_delegator; 
    // {
        // Set the withdrawal address.
        // Note: This operation is only necessary on the first time for a staker entry, in order to initialize the value.
        set r2 into withdraw[r0]; W 

        // Ensure that the validator is open to new stakers. By default, `is_open` is set to `true`.
        cast true 0u8 into r6 as committee_state; 
        get.or_use committee[r1] r6 into r7; R 
        assert.eq r7.is_open true; 

        // Get the number of delegators.
        get.or_use metadata[aleo1qgqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqanmpl0] 0u32 into r8; R 
        // Increment the number of bonded delegators by one.
        add r8 1u32 into r9;
        // Determine if the number of delegators is less than or equal to 100_000.
        lte r9 100_000u32 into r10;
        // Enforce that the number of delegators is less than or equal to 100_000.
        assert.eq r10 true;
        // Set the new number of delegators.
        set r9 into metadata[aleo1qgqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqanmpl0]; W 
    // }

    position continue_bond_delegator;

    /* Bonded */

    // Construct the initial bond state.
    cast r1 0u64 into r11 as bond_state;
    // Get the bond state for the caller, or default to the initial bond state.
    get.or_use bonded[r0] r11 into r12; R 
    // Enforce the validator matches in the bond state.
    assert.eq r12.validator r1;

    // Increment the microcredits in the bond state.
    add r12.microcredits r3 into r13;

    // Determine if the amount is at least 10 thousand credits.
    gte r13 10_000_000_000u64 into r14;
    // Enforce the amount is at least 10 thousand credits.
    assert.eq r14 true;

    // Construct the updated bond state.
    cast r1 r13 into r15 as bond_state;

-    /* Account */
-
-    // Get the balance of the caller.
-    // If the account does not exist, this finalize scope will fail.
-    get account[r0] into r16;
-    // Decrement the balance of the caller.
-    sub r16 r3 into r17;

    /* Delegated */

    // Get current total delegated amount.
    get.or_use delegated[r1] 0u64 into r18;
    // Add new bond amount to current delegation.
    add r3 r18 into r19;

    ...
    ...
    ...
infosecual commented 1 week ago

@evanmarshall , I am curious what you think of this submission. I don't really think it is worth escalating as it will not change payout much. I do believe it is something that Aleo will want to fix though so I figured I would bring it to your attention. I don't want it to slip through the cracks during the rest of the noise of this competition.

The idea here is that the account[] balance check happens deep in the function. Since this type of transaction could pass the initial speculate checks and fail during execution/finalize due to insufficient balance, the program should exit sooner than later. The fee commitment burned will be the same amount no matter what. Moving this check higher up in the function will prevent the Aleo network from wasting CPU/Storage cycles on the compute section above. This will limit the code surface to accounts with at least 10K credits (accounts w 1 credit after fee commitment can run it currently).

evanmarshall commented 1 week ago

Yes, it's not a bad idea to move it earlier. I would categorize it as an improvement because you could always design a transaction to fail at the latest possible point and if it's significantly mispriced, that could be an issue.

WangSecurity commented 6 days ago

Escalate

Escalating on behalf of @infosecual

sherlock-admin3 commented 6 days ago

Escalate

Escalating on behalf of @infosecual

You've created a valid escalation!

To remove the escalation from consideration: Delete your comment.

You may delete or edit your escalation comment anytime before the 48-hour escalation window closes. After that, the escalation becomes final.

cvetanovv commented 4 days ago

As the sponsor has written, the transaction can be designed so that there is no real problem. So this issue is a Low/Design improvement

infosecual commented 3 days ago

I think Evan was saying that an attacker could purposely make a transaction that will fail at the last possible moment to make the network compute as much of the finalize section as possible before being discovered to be invalid. The way this function is currently set up it would be better to move the account[] checks higher up to prevent just anyone from being able to do this. If the transaction fails the fee commitment will be burned but it will not burn any more if the transaction fails at the end vs the beginning. Because of this It would be better for the network from a DOS resilience perspective to fail the transaction as early as possible. If the check was moved up it would mean that only accounts with at least 10K credits can execute the majority of this function, which the sponsor agrees would be an improvement.

WangSecurity commented 2 days ago

I'm not sure I understand the impact correctly, wasting CPU/Storage cycles is it actually severe and what can be absolute severe impact? Network congestion? Network stops working?

infosecual commented 2 days ago

Honestly it is difficult to tell with this one. There are a few factors here- One is that there is a fixed cost for transactions that fail. When the fail they switch from an execution type transaction to a fee type transaction. The interesting thing is that no matter if bond_public reverts on line 1 or line 100 the transaction converts to a fee type and the same fee is charged for both cases, while the second case uses significantly more of the network's resources. That is why it is important to design these functions with a "weed out the bad actors first" mentality. Sort of how you shouldn't expose an admin interface and check at the end of call if the user is an admin, you should check if they are an admin first and exit early to prevent exposing a bunch of code attack surface to non-admins.

I agree with Evan here:

...you could always design a transaction to fail at the latest possible point and if it's significantly mispriced, that could be an issue

Since the attacker is indeed paying some fee here a key point of me being able to 100% prove the impact of this is answering "if it is significantly mispriced". I have not created a POC to show that the worst case transaction here causes CPU usage and state reads/writes that can stress a validator to be pushed passed the threshold of a block time with less than the amount of gas in a block. This is due to the fact that a POC would have been difficult/impossible to create without knowing the specs of the validators running Aleo mainnet (which doesn't yet exist). I will add that this code is not just run on the prover, it is run redundantly on every validator on the Aleo network so the slowest horse matters. If this limit is reached then the lower resourced validators will miss attestations and block proposals which would definitely degrade the health of the network.

I will say that I am pretty confident this is a "will fix". I am confident that refactoring this is a best practice at minimum. The before and after are such that:

I would be happy to defer to @evanmarshall on if he thinks this is a "will fix" and if he thinks it makes a material impact to making Aleo more resilient to DOS. FWIW I think these judgments are independent of each other and I will not be upset if this hits a threshold under the danger of what we would consider a medium. If It can trigger the missed attestations and proposals of slower validators I mentioned above then it would actually be a high. I reported as a medium because of my confidence level during the time of submission and my inability to present an indisputable POC. I definitely feel that it was worth reporting and shining daylight on regardless of the reward outcome.

IWildSniperI commented 1 day ago

this is indeed an issue, but As LSW said missing minimum validator specs of Aleo and well structured scenario depending on them makes the validity questionable

WangSecurity commented 1 day ago

And the cost of such attack is relatively low, just construct a transaction so it fails and pay the fee, correct?

infosecual commented 1 day ago

Yes. The cost of the attack is just the tx fees.

morbsel commented 1 day ago

It is good practice to do verification checks in the beginning but linking it to DOS is probably a bit over the top. It can waste some CPU cycles but nothing crazy while a tx fee has to be paid.

Probability is low since money needs to be paid just to waste some CPU cycles. Impact is low since we are talking a bout some CPU cycles that a validator should be able to handle without any problems if it is not running on a literal potato. Therefore I believe this is nothing more than informational.

Haxatron commented 1 day ago

I am really confused why this is something to be fixed in credits.aleo

The interesting thing is that no matter if bond_public reverts on line 1 or line 100 the transaction converts to a fee type and the same fee is charged for both cases, while the second case uses significantly more of the network's resources.

Can you elaborate? Is the fee being underpriced relative to resource consumption being used? And more importantly, what's stopping someone from deploying the unoptimized version of credits.aleo or any other smart contract that does something similar really and using it to attack the chain after the fixes are made on the report here.

evanmarshall commented 1 day ago

It's more of a hypothesis for an optimization. The transaction is priced in the worst case of every op being executed, not what's actually executed.

infosecual commented 1 day ago

Thanks Morbsel. I think you may be underestimating how serious something like this can be. Issue 5.1 in this report is just one example of a scenario where "money needs to be paid just to waste some CPU cycles." A POC was made for this scenario that was able to push the sequencer outside of the block time and it is not "running on a literal potato" but on a fully maxed out dedicated cloud instance.

The difference in this situation is that we do not know the specification for all Aleo mainnet validators so it is impossible to make a POC. One advantage that network in the linked report has over Aleo is that there is one centralized sequencer that can be max spec. Aleo will have a distributed validator set and the lowest resourced machine's performance is the upper bound of performance for the network, which makes the DOS bar even lower.

I believe @Haxatron asks some really good questions here, namely:

what's stopping someone from deploying the unoptimized version of credits.aleo or any other smart contract that does something similar really and using it to attack the chain after the fixes are made on the report here

I think this is probably the strongest criticism of this issue yet. My counter here is that this is the canonical precompile. This contract will be referenced and called by other contracts. There is no way to possibly know all of the deployed code that will have aleo.credits as a dependency. One example that would cause issues is an LST that offers to to cover gas costs for anyone that will delegate to their validator. Abuse vectors can be multiplied by factors that are not known at launch time. This is why precompiles on other chains get rigorous security audits. They are a bit more important than your average contract.

What Evan mentions is at the root of what this issue calls out

The transaction is priced in the worst case of every op being executed, not what's actually executed.

It doesn't make Aleo any more fee burn to run unoptimized precompiles. Aleo will get the same amount whether the TX is reverted a line 1 or 100. Therefore it should design its precompiles to revert as soon as possible for invalid transactions that could be abuse related.

In the end I do not have a provable DOS POC as it is impossible for me to make one or know that it hits the threshold of pushing a validator outside the required block-time/gas ratio. If it is possible then this would definitely be a high. I filed as a medium since I can't prove the high beyond a shadow of a doubt.

Aleo is a new chain and has not been tested yet for its fee pricing being adequate. Ethereum had to endure various mainnet attacks (eg Shanghai DOS) to fix its gas mis-pricing issues. I recommend that Aleo come back to Sherlock for another contest with a broader scope after mainnet launch and after we can witness how mainnet validators perform. It would be good to have Sherlock's watsons bench all of the resource and consensus related functionality of SnarkOS and the snarkVM.

I am confident this is worth fixing so I made sure it was surfaced. I will not be upset if the judge rules against it. I decided not to file it as a high for this very reason - it is impossible to create a POC for it at this time. I will however educate about the various things that need to be taken into consideration in its judgment.

Haxatron commented 1 day ago

But this is what the sponsor said:

The transaction is priced in the worst case of every op being executed, not what's actually executed.

I believe he means that the fee is actually overpriced rather than underpriced and it makes sense actually because I believe Aleo does not have a precise mechanism to track what opcodes were actually executed in the call frame, so they take the full fee that would have been charged if the transaction was fully executed, regardless of whether it failed or not. Feel free to correct me on this.

Therefore, I think the attack is invalid, unless you can explain how this scenario can lead to the fee being underpriced.

infosecual commented 1 day ago

Aleo does not have a precise mechanism to track what opcodes were actually executed in the call frame, so they take the full fee that would have been charged if the transaction was fully executed, regardless of whether it failed or not. Feel free to correct me on this.

This is incorrect. If the tx does not fail it is priced as an execute transaction. When it fails it is priced as a fee transaction, which is where the fee is the same no matter where the failure happens.

Therefore, I think the attack is invalid, unless you can explain how this scenario can lead to the fee being underpriced.

I think I have made it pretty clear that I cannot possibly defend the DOS. This issue is pointing out that there is a more secure way to implement a critical precompile. I will just re-post what I previously sent since so much has happened since:

I will say that I am pretty confident this is a "will fix". I am confident that refactoring this is a best practice at minimum. The before and after are such that:

  • Before anyone can cause more computation than they should be able to if the code was robust
  • After only entities with 10,000 credits can run the gated code which disqualifies the majority of the network (possibly all prospective bad actors)

I would be happy to defer to @evanmarshall on if he thinks this is a "will fix" and if he thinks it makes a material impact to making Aleo more resilient to DOS. FWIW I think these judgments are independent of each other and I will not be upset if this hits a threshold under the danger of what we would consider a medium. If It can trigger the missed attestations and proposals of slower validators I mentioned above then it would actually be a high. I reported as a medium because of my confidence level during the time of submission and my inability to present an indisputable POC. I definitely feel that it was worth reporting and shining daylight on regardless of the reward outcome.

Haxatron commented 1 day ago

Yes and if you look at the sponsor comment:

The transaction is priced in the worst case of every op being executed, not what's actually executed.

This leads me to believe that the worst-case cost is being charged for the transaction if it fails => so overpriced not underpriced. I think @evanmarshall clarify whether or not it is overpriced or underpriced.

I think I have made it pretty clear that I cannot possibly defend the DOS.

I am not asking for a PoC, I am asking for why it leads to being underpriced. Otherwise, this is just the same as saying "I can call any resource-intensive smart contract on Ethereum to DoS the chain even while paying the correct amount of fees" is a vulnerability, which makes no sense.

infosecual commented 1 day ago

I am asking for why it leads to being underpriced. Otherwise, this is just the same as saying "I can call any resource-intensive smart contract on Ethereum to DoS the chain even while paying the correct amount of fees" is a vulnerability, which makes no sense.

I am definitely not saying this. I have said multiple times I am not defending the DOS aspect of this. My report does not claim I have a DOS. The summary is:

"bond_public's check that the calling account has adequate funds is needlessly far into the function's execution, opening up the Aleo network to more resource consumption than is needed to determine the validity of the call."

Nowhere in my report do I use the word underpriced. Evan was the first to use anything of the sort in his comment:

Yes, it's not a bad idea to move it earlier. I would categorize it as an improvement because you could always design a transaction to fail at the latest possible point and if it's significantly mispriced, that could be an issue.

I just re-linked what I actually said for the second time in my last comment. I feel like comments like these are a DOS :)

But in all seriousness I have summarized my points and reposed the most concise version of them in https://github.com/sherlock-audit/2024-05-aleo-judging/issues/30#issuecomment-2212574888. I would have wanted this issue reported to me if it was my pre-compile so I have done just that. Unfortunately, I have reached the maximum amount of time I can give this issue today so I will have to leave it at that.

WangSecurity commented 20 hours ago

The problem here is that's very hard to verify the real-world impact, I see how this comment explains the impact on validators really well (missed attestations), but the report only shows that it would cause larger resource consumption for Aleo. I don't think it qualifies for Medium severity, cause it's hard to measure how large and bad this consumption would be. Good catch though.

Planning to reject the escalation and leave the issue as it is.

infosecual commented 19 hours ago

Thank you @WangSecurity. Glad to have this one wrapped up.