fix: bug in snapshot time in the presence of low rps

smol-ninja commented 3 months ago

During withdraw, the normalization process can introduce a small discrepancy in the original time value due to loss of significant digits during the process. So we reverse the normalization calculation and reconstruct the time value to be used in _updateSnapshotTime.

Failed invariant test: https://github.com/sablier-labs/flow/actions/runs/10618225623/job/29432980263?pr=219

@andreivladbrg note the failing fuzz test (WithdrawAt_Integration_Fuzz_Test) in this PR. Ideally, the fuzz test should not fail. The failing fuzz test proves the discrepancy between the the original and the recovered value.

Once we agree on this, I will modify the test to account for the recovered time value.

andreivladbrg commented 2 months ago

Thanks for creating this PR, it indeed seems to be a problem - yet not sure if in src or in tests.

Regarding your proposed solution, it doesn't seem to be a correct implementation as, when you normalize the amount returned by _ongoingDebtOf you add zeros at the end, and not the actual correct normalized amount calculated inside. (i.e. the decimals after the factor: in case of USDC, the last 12 decimals).

For a better understand please see this code example:

    function testUsdc() public pure returns (uint256, uint256, uint256, uint256) {
        // rps ideally for 10 USDC per day
        uint256 rps = 0.00011574074074074e18;
        uint256 elapsedTime = 1 days;
        uint8 decimals = 6;

        // 9.999999999999936000
        uint256 ongoingDebtNorm = elapsedTime * rps;
        // 9.999999
        uint256 ongoingDebtDen = denormalizeAmount(ongoingDebtNorm, decimals);
        // 9.999999000000000000
        uint256 ongoingDebtNormCalculatedInWithdraw = normalizeAmount(ongoingDebtDen, decimals);

        // 86399 - it is normal to be less, since we divide: ongoingDebtNormCalculatedInWithdraw / rps
        uint256 timeReCalculated = ongoingDebtNormCalculatedInWithdraw / rps;

        return (ongoingDebtNorm, ongoingDebtDen, ongoingDebtNormCalculatedInWithdraw, timeReCalculated);
    }

The tests failling is normal.

If you change the withdraw function as the fuzz tests won't fail anymore:

        uint40 reconstructedTime = time;
        uint40 snapshotTime = _streams[streamId].snapshotTime;

        if (!_streams[streamId].isPaused && time > snapshotTime) {
            // The normalization process can introduce a small discrepancy in the original `time` value due to loss of
            // significant digits during the process. So we reverse the normalization calculation and reconstruct the
            // `time` value to be used in `_updateSnapshotTime`.

            uint128 normalizedAmount = (time - snapshotTime) * _streams[streamId].ratePerSecond.unwrap();

            reconstructedTime =
                uint40(_streams[streamId].snapshotTime + normalizedAmount / _streams[streamId].ratePerSecond.unwrap());
        }

smol-ninja commented 2 months ago

when you normalize the amount returned by _ongoingDebtOf you add zeros at the end, and not the actual correct normalized amount calculated inside

Precisely why the bug exists. Just to give you an idea, lets assume the following sequence:

We provide time as t and calculate ongoing debt as 9.999999999999936000 lets say.
Then we denormalize the debt amount in _ongoingDebtOf to 9.999999.
Now, given the return value of _ongoingDebtOf to be 9.999999, the exact time corresponds to this new denormalized value would differ by some δ and will not be the one that we provided originally i.e. t.
So, in theory, we should store $(t - δ)$ in as snapshot time and not t.

Here is a question for you, given a stream, should function _ongoingDebtOf be mapped one-to-one for all values of time? IMO yes, since its a straightforward calculation. However, normalization transforms _ongoingDebtOf into a many-to-one function and thus the bug.

You can find a range $[t, t+δ]$ in which, you can expect the "same" _ongoingDebtOf for any value of t.

smol-ninja commented 2 months ago

Consider a wrapped bitcoin with 6 decimals. A sender decides to stream 0.001e6 sBTC (~ $60) a day, which makes rps = 0.000000011574e18.

When the elapsed time is between 90 and 170 seconds, the withdrawal amount would always be 1 token. So, an attacker waits until the maximum of this range, i.e., 170 seconds, and triggers a withdrawal with time = snapshotTime + 170. This would withdraw 1 token. Theoretically, at 170 seconds, the amount streamed is 1.96. So, the recipient loses 0.96 tokens (48% of streamed). This can be repeated any number of times and create a non-trivial loss for the recipient.

The solution is to reverse calculate the time and update that as the snapshot time. In this case, since the net withdrawal would be 1 token, we should update the snapshot time as 90 when the time provided is in the range of 90-170.

This is a realistic example.

andreivladbrg commented 2 months ago

I am sorry, but I don't think your example is correct.

the amount streamed is 1.96.

as you said: time = snapshotTime + 170 which means that elapsedTime is 170, right?

thus, ongoingDebt = denormalized(rps * elapsedTime = 0.000000011574e18 * 170) which is actually 1

am i missing smth?

andreivladbrg commented 2 months ago

as discussed on Slack, the problem occurs when elapsedTime is non-zero, but the denormalized ongoing debt is 0.

also, the problem arises when the rps is lower than the minimum value transferable, in case of USDC (6 decimals = min =0.000001).

    uint rps = 0.000000115740740740e18;
    uint elapsedTime = 2 seconds; 
    uint8 decimals = 6;

    // this will return 0, but the snapshotTime will be update with +2
    function testWithdraw() public view returns (uint) {
        uint ongoingDebtNorm = elapsedTime * rps;
        return denormalizeAmount(ongoingDebtNorm, decimals);
    }

my proposed solution is to add these to withdraw function:

// Update the snapshot time only if the ongoing debt is greater than 0.
if (ongoingDebt > 0) {
    _updateSnapshotTime(streamId, time);
}

smol-ninja commented 2 months ago

thus, ongoingDebt = denormalized(rps elapsedTime = 0.000000011574e18 170) which is actually 1

At 170s, ongoing debt returns 1 but at 180s it would return 2. So theoretical value of ongoing debt is 1.96 (rps elapsedTime = 0.000000011574e18 170) but due to denormalization, it returns 1. You can supply any value of time in the range [90, 170] and it would always return 1. The loss is zero, if the withdraw is called with t = 90 and positive for all 170 > t > 90.

my proposed solution is to add these to withdraw function

The example I provided has a non zero ongoing debt. So the issue can exist with any value and not just zero.

andreivladbrg commented 2 months ago

oh sheesh, got it now, thanks for re-exlaining

smol-ninja commented 2 months ago

Perfect. Since the precision loss can be significant in the case of a low RPS stream with fewer decimal tokens (such as wrapped Bitcoins), I'd propose adding new integration tests that solely focus on low RPS with a 6-decimal token. So that the CI can easily identify such bugs in the future.

smol-ninja commented 2 months ago

I found an easier way to replicate this issue locally. Bound boundRatePerSecond to boundUint128(ratePerSecond, 0.0000001e18, 10e18); in Utils, i.e. minimum 0.008 tokens a day ($480 in BTC).

I suggest lets reduce the minimum rps bound to 0.0000001e18. Then there is no need to add a new integration test to focus on low RPS with a 6-decimal token as it is now being covered by the fuzz test itself.

Added in my latest commit. Now few more tests fail and the goal is to resolve them all.

andreivladbrg commented 2 months ago

found an easier way to replicate this issue locally. Bound boundRatePerSecond to boundUint128(ratePerSecond, 0.0000001e18, 10e18);

yeah, you are right, it is easier this way

smol-ninja commented 2 months ago

@andreivladbrg feel free to give it a review tomorrow and ask as many questions as come to your mind.

smol-ninja commented 2 months ago

I have increased number of runs in ci as I think its more appropriate to have higher number of runs in Flow.

smol-ninja commented 2 months ago

yes, should we optimize the _ongoingDebt so that corrected time is calculated only in this case?

Yes good point. In that case, should we move logic from Helpers contract to _ongoingDebtOf since its the only place where we are calling normalizeAmount and denormalizeAmount? It would be much more efficient to have everything inside _ongoingDebt. Wdyt?

andreivladbrg commented 2 months ago

Helpers contract to _ongoingDebtOf

interesting idea, we still use normalizeAmount in depletionTimeOf

much more efficient

i wouldn't say "much", it is small difference. the problem i see with it, is that the SablierFlow is already having a lot of logic (> 800 lines of code), i would keep Helpers regardless of the small gas differences

smol-ninja commented 2 months ago

i wouldn't say "much", it is small difference. the problem i see with it, is that the SablierFlow is already having a lot of logic (> 800 lines of code), i would keep Helpers regardless of the small gas differences

OK. Though I have moved them from Helpers to SablierFlow in this PR. Have a look and let me know what you think. I liked it tbh because it gives us more control over the code. The code is much cleaner and more optimized. So if you think the same, we keep it else we debate it :)))

smol-ninja commented 2 months ago

To @PaulRBerg

WDYT about renaming correctedTime to discreteSnapshotTime or smth that includes the 'discrete' terminology?

I am not sure what would be the appropriate name. Naming variables is hard. The way I see it is that it represents the earliest timestamp when the current amount values (calculated using block.timestamp) would also hold true. So at current timestamp, if we calculate all the amounts, the amounts will also result the same value all the way into the past until we hit correctedTime, regardless of whether we take the snapshot or not. How about minTimeCheckpoint or something like that?

PS: snapshotTime is always discrete technically.

PaulRBerg commented 2 months ago

Good points about the name of the variable .. in this case, let's stick with snapshotTime to not break the symmetry with snapshotDebt. If we find a better name before or during the audit, we will change it then.

andreivladbrg commented 2 months ago

I've just found a big problem with the introduction of correctedTime. This will prevent the PR from being merged, since it could trigger a cascading effect that drains all the stream balance.

The issue arises when the correctedTime calculated inside _ongoingDebtOf equals the snapshotTime and the ongoing debt is greater than zero, which would basically allow anyone to call withdraw until there is not balance left on that stream. This can happen when only a small amount of time has passed, causing the renormalized ongoing amount to result in a value smaller than the rate per second.

To better understand, please see the code

Proof of concept

```solidity function test_DrainBalance() public { uint8 decimals = 3; IERC20 token = createToken(decimals); uint128 factor = uint128(10 ** (18 - decimals)); deal({ token: address(token), to: users.sender, give: 1_000_000e3 }); resetPrank(users.sender); token.approve(address(flow), type(uint256).max); vm.warp(MAY_1_2024); uint128 rps = 3205.285530987472545e15; uint128 depAmount = (rps * 1 days) / factor; // deposit for 1 day uint256 streamId = flow.createAndDeposit(users.sender, users.recipient, ud21x18(rps), token, true, depAmount); uint40 previousActualSnapshotTime = flow.getSnapshotTime(streamId); assertEq(previousActualSnapshotTime, MAY_1_2024); uint128 previousActualBalance = flow.getBalance(streamId); assertEq(previousActualBalance, depAmount); // warp just one second uint40 newTimestamp = MAY_1_2024 + 1 seconds; vm.warp(newTimestamp); uint128 previousActualOngoingDebt = flow.ongoingDebtOf(streamId); uint128 previousExpectedOngoingDebt = (rps * 1 seconds) / factor; assertEq(previousActualOngoingDebt, previousExpectedOngoingDebt); // after 1 second the ongoing debt should be basically rps normalized to token decimals assertEq(previousActualOngoingDebt, 3205); // https://www.wolframalpha.com/input?i=3205285530987472545%2F%2810%5E15%29 // 3205.000000000000000e15 / 3205.285530987472545e15 = 0 // correctedTime = MAY_1_2024 --> snapshotTime remains the same uint40 expectedCorrectedTime = uint40(previousActualOngoingDebt * factor / rps + MAY_1_2024); flow.withdrawAt(streamId, users.recipient, newTimestamp); uint40 actualSnapshotTime = flow.getSnapshotTime(streamId); assertEq(actualSnapshotTime, expectedCorrectedTime); assertEq(actualSnapshotTime, previousActualSnapshotTime); assertEq(actualSnapshotTime, MAY_1_2024); uint128 actualBalance = flow.getBalance(streamId); uint128 expectedBalance = previousActualBalance - previousActualOngoingDebt; assertEq(actualBalance, expectedBalance); // theorectically, the ongoing debt should be 0, it remains the same uint128 actualOngoingDebt = flow.ongoingDebtOf(streamId); assertEq(previousActualOngoingDebt, actualOngoingDebt); // since we have deposited for 1 day, it means we can withdraw for seconds 1 in one day - 1 second (prev // withdraw) for (uint256 i = 0; i < 1 days - 1 seconds; ++i) { uint128 amountWithdrawn = flow.withdrawAt(streamId, users.recipient, newTimestamp); assertEq(amountWithdrawn, 3205); // each time 3205 } // the snapshot time remains the same actualSnapshotTime = flow.getSnapshotTime(streamId); assertEq(actualSnapshotTime, MAY_1_2024); } ```

The reason this was never caught by the invariants is that we warp by at least 2 minutes:

https://github.com/sablier-labs/flow/blob/7f6ae6c315a29dccfe4287de1dfc4f549450a81c/test/invariant/handlers/BaseHandler.sol#L49

However, I am surprised that it was not caught by fuzz tests, as we do:

https://github.com/sablier-labs/flow/blob/7f6ae6c315a29dccfe4287de1dfc4f549450a81c/test/integration/fuzz/withdrawAt.t.sol#L222-L230

A potential solution might be to return ongoingAmount = 0 in case correctedTime = snapshotTime @sablier-labs/solidity wdyt?

andreivladbrg commented 2 months ago

I think we should also add an invariant: if the functions (adjustRatePerSecond, pause, withdrawAt ) that updates snapshotTime are called, the time cannot remain constant

PaulRBerg commented 2 months ago

the time cannot remain constant

What time? correctedTime?

And why cannot it remain constant? If multiple withdraw txs are included in the same block, the correctedTime should be the same. Right?

andreivladbrg commented 2 months ago

Please review my finding above. correctedTime cannot be the same with snapshotTime, and if ongoingDebt > 0.

If the ongoing amount is zero, it's fine to update snapshotTime with the same value

PaulRBerg commented 2 months ago

Good catch, @andreivladbrg!

I have followed up here.

smol-ninja commented 2 months ago

Good finding @andreivladbrg. Glad you found it before we could merge the PR.

I am thinking, as a solution, if we compare the value of correctedTime with snapshotTime, it may not be very obvious why we return 0. So how about, if we put a comparison between renormalizedOngoingDebt and ratePerSecond and return 0 if renormalizedOngoingDebt < ratePerSecond? This seems to be easier to understand i.e. to return 0 if the renormalized value is less than ratePerSecond.

smol-ninja commented 2 months ago

I think we should also add an invariant: if the functions (adjustRatePerSecond, pause, withdrawAt ) that updates snapshotTime are called, the time cannot remain constant

Agree, and also the following: allow 1 second to pass through passTime and in fuzz too. Wdyt?

andreivladbrg commented 2 months ago

if we put a comparison between renormalizedOngoingDebt and ratePerSecond and return 0 if renormalizedOngoingDebt < ratePerSecond? This seems to be easier to understand i.e. to return 0 if the renormalized value is less than ratePerSecond.

agree with you, it is easier to understand if we compare the rps with the renormalized amount

allow 1 second to pass through passTime and in fuzz too. Wdyt?

i would even bound the time between [0 seconds, x number of days]

smol-ninja commented 2 months ago

I have only included the check in the SablierFlow.sol file. And will review / edit tests in https://github.com/sablier-labs/flow/pull/232 as it requires more changes in tests than expected.

@andreivladbrg lmk if its good to merge then. And lets continue this in your PR.

andreivladbrg commented 2 months ago

I have only included the check in the SablierFlow.sol file. And will review / edit tests in https://github.com/sablier-labs/flow/pull/232 as it requires more changes in tests than expected.

agree, let's merge it, the PR is already spammed, we will address issues in the withdraw one 👍

PaulRBerg commented 2 months ago

team work

sablier-labs / flow

fix: bug in snapshot time in the presence of low rps #220