Feebump algorithm - Githubissues

revault / research

4 stars 1 forks source link

Feebump algorithm #3

Open darosior opened 3 years ago

darosior commented 3 years ago

Before even considering if we should based on current fee estimates, and if so by how much we should, we need to decide whether to feebump in the first place.

A quick answer is "lol, every time the fee estimation is updated (each block connection), what else" but this conservative answer is really expensive in most cases. What if:

we have a CSV of 144 blocks and the Unvault was just confirmed in the previous block
there is a fee spike
there has not been any block for the last 20 minutes
we now connect 3 blocks in a row

We would in the above example feebump at an insane feerate for no good reason (feerate would turn back to normal after the 3 blocks connections and we'd have a day anyways).

However, we must for sure update our feerate if needed each time the fee estimates change if we have very few blocks left! In other terms, start lazily then enter panic mode.

So here is my simple suggestion:

def update_feerate(current_height, last_update_height, remaining_locktime):
    return current_height >= last_update_height + remaining_locktime // X

with X being IMO well set to 6.

We could further tweak the formula to exponentially decrease the threshold for high CSVs, therefore retaining the "additional breathing room" gained by the conservative choice of a longer CSV:

def update_feerate(current_height, last_update_height, remaining_locktime, CSV):
    return current_height >= last_update_height + remaining_locktime // X**(1 + round((CSV / X) / remaining_locktime))

This enters "panic mode" depending on a fraction of the CSV, but is exponentially decreasing (you won't try to re-feebump every block if you have 2016 blocks in front of you!)

JSwambo commented 3 years ago

I'm not sure how your proposal helps to mitigate paying too much during a short fee-spike. Selecting an arbitrary subset of blocks to enable fee-bumping doesn't necessarily reduce the likelihood that you are fee-bumping during a fee-spike. Instead I think we should rely on estimatesmartfee with a longer target (if remaining lock-time is large) to not overpay during short fee-spikes. Please correct me if estimatesmartfee doesn't work this way.

In the following I define a function that returns the amount to fee-bump by, rather than a boolean, since I wanted to distinguish different cases. I have probably made mistakes with units in fee calculations, its only pseudocode, and concrete values are only given as an example

from bitcoinlib import getblock, getblockheight, estimatesmartfee

PANIC_THRESHOLD = 20
ATTACK_THRESHOLD = 5
SMALLEST_FEEBUMP_COIN_VALUE = 20 # 20 sats/vbyte

# search through persistant data store for the allocated fee-reserve for the vault associated with this tx
def get_fee_reserve_for_vault(tx):
    return db.get_fee_reserve_for_vault(tx)

# get number of times this tx was censored from a block
def get_attack_score(tx):
    return db.get_attack_score(tx)

# set number of times this tx was censored from a block
def set_attack_score(tx, val):
    db.set_attack_score(tx, val)

# Run at each new block connection
def update_feerate(tx, remaining_locktime, height_at_broadcast) -> int: # Return number of sats to bump by
    # If not in panic mode
    if remaining_locktime > PANIC_THRESHOLD:  
        # If our feerate was sufficient for inclusion
        if tx.feerate > sort([ transaction.feerate for transaction in getblock(getblockheight()).transactions ])[-1]: 
            # Allow one block to assume tx has time to propagate to all miners (in case we broadcast just before a new
            # block is mined). Otherwise consider this as a censorship or network-layer attack signal
            if getblockheight() - height_at_broadcast =! 1: 
                set_attack_score(tx, get_attack_score() +1)

            # Active censorship or network-layer attack
            if get_attack_score() >= ATTACK_THRESHOLD: 
                # Use full fee-reserve in attempt to outbid attacker
                return get_fee_reserve_for_vault(tx) - tx.feerate

        # else, our feerate was insufficient for inclusion
        else: 
            # Can't determine whether fee-market increase will remain or not, 
            # so just estimate again and bump if it's a significant difference.

            # To reduce over-sensitivity to short-term spikes, if remaining_locktime is large,
            # use a longer target
            if remaining_locktime > 144 # ~1 day
                df = estimatesmartfee(target=5) - tx.feerate
            elif remaining_locktime > 72 # ~1/2 day
                df = estimatesmartfee(target=3) - tx.feerate
            else:
                df = estimatesmartfee() - tx.feerate

            # If the difference is significant (assuming our coin creation is reasonable)
            if df >= SMALLEST_FEEBUMP_COIN_VALUE:
                return df
            # Otherwise don't fee-bump
            else:
                return 0

    # If in panic mode
    if remaining_locktime <= PANIC_THRESHOLD:
        # Use full fee-reserve to try maximise likelihood of inclusion
        return get_fee_reserve_for_vault(tx) - tx.feerate

darosior commented 3 years ago

I'm not sure how your proposal helps to mitigate paying too much during a short fee-spike. Selecting an arbitrary subset of blocks to enable fee-bumping doesn't necessarily reduce the likelihood that you are fee-bumping during a fee-spike. Instead I think we should rely on estimatesmartfee with a longer target (if remaining lock-time is large) to not overpay during short fee-spikes. Please correct me if estimatesmartfee doesn't work this way.

Right, what you describe is basically the same but with less chance of getting confirmed. However it's probably cheaper. However your following proposal isn't quite as generalist as my above. Given a CSV, current_height and a start_height (height at which the Unvault output was confirmed) we can define:

target = max(CSV / 4 - (current_height - start_height), 1) the target to give to estimatesmartfee for the first opportunistic broadcast.
rush_height = start_height + CSV / 4 the height at which we'll start fee-bumping at each block with a target of 1 block. This is linear and less fancy than my original formula but seems a good tradeoff between 1. having a high CSV allows you to not spend too much in cancelation fee 2. having a high CSV leaves you a much larger window to fee-bump 3. not trying to optimise if you have an insanely low CSV (like 6).

In the following I define a function that returns the amount to fee-bump by, rather than a boolean, since I wanted to distinguish different cases. I have probably made mistakes with units in fee calculations, its only pseudocode, and concrete values are only given as an example

Sure. Just a few opportunistic comments on the logic as it's a good showcase of what you have in mind.

            # Active censorship or network-layer attack
            if get_attack_score() >= ATTACK_THRESHOLD: 
                # Use full fee-reserve in attempt to outbid attacker
                return get_fee_reserve_for_vault(tx) - tx.feerate

Interesting. I don't think this is of any value for most attacks and having this trigger could rather incentivize some other attacks. For the two attacks mentioned in your comment specifically:

This is obviously not going to do anything against a P2P layer attack on the propagation of your transaction (you are not going to get your RBFed tx propagated anyways)
This is not going to make the honest part of the miners more honest if you are being censored by <50% of the hashrate. The only thing that helps in this case is increasing the CSV.
This is probably not going to outbid someone who bribed >50% of the hashrate with our modest per-vault fee reserve.

I would therefore argue that it is effectively burning money.

            # To reduce over-sensitivity to short-term spikes, if remaining_locktime is large,
            # use a longer target

Oh, you are bumping still at each block. I think it's strictly worse than my initial proposal then as it increases the load on you and on the network (and makes the heuristics to find you are the originator of the tx much much worse).


            # If the difference is significant (assuming our coin creation is reasonable)
            if df >= SMALLEST_FEEBUMP_COIN_VALUE:
                return df
            # Otherwise don't fee-bump
            else:
                return 0

I would rather fee-bump at any cost if we need to. ie if 1. we need to feebump for the next block 2. our feerate is less than the next block feerate 3. we still have a coin left. Then just feebump no matter if we overpay.

    # If in panic mode
    if remaining_locktime <= PANIC_THRESHOLD:
        # Use full fee-reserve to try maximise likelihood of inclusion
        return get_fee_reserve_for_vault(tx) - tx.feerate

Same, i think that it's just burning money (when you only have a fee-bumping wallet, every problem looks like a fee-paying one :p). I think we should always feebump to >= next block feerate (apart from the first hope-for-the-best broadcast) but not burning the entire reserve for no reason.

JSwambo commented 3 years ago

Right, what you describe is basically the same but with less chance of getting confirmed.

That's the point, to smooth out sensitivity to short fee-spikes. Not reacting to spike == less chance of getting confirmed.

* `target = min(CSV / 4 - (current_height - start_height), 1)` the target to give to `estimatesmartfee` for the first opportunistic broadcast.

By this definition, isn't the target always 1? We should also consider that not all deployments would want to optimise for minimum fees on Cancels, but may want to prioritise business continuity (catching the invalid spend, and revaulting quickly to allow the correct spend to go through asap).

* `rush_height = start_height + CSV / 4` the height at which we'll start fee-bumping at each block with a target of `1` block. This is linear and less fancy than my original formula but seems a good tradeoff between 1. having a high CSV allows you to not spend too much in cancelation fee 2. having a high CSV leaves you a much larger window to fee-bump 3. not trying to optimise if you have an insanely low CSV (like `6`).

I think making the rush_height (or PANIC_THRESHOLD above) a constant function of the remaining lock-time is more sensible. To me it makes sense to say "when the remaining lock-time is X blocks, we forget about optimising the fees and prioritise thwarting an invalid spend". Why make it a function of the CSV, it's irrelevant? "when the remaing lock-time is CSV/4, we forget about optimising fees and prioritise thwarting an invalid spend" isn't logical to me.

* This is obviously not going to do anything against a P2P layer attack on the propagation of your transaction (you are not going to get your RBFed tx propagated anyways)

Right. I think the detection of attack is fairly sound. But distinguishing which type of attack and how best to respond is an open question.

* This is probably not going to outbid someone who bribed `>50%` of the hashrate with our modest per-vault fee reserve.

No I don't think so either. I think it primarily serves as a stronger proof that censorship occurred.

Oh, you are bumping still at each block. I think it's strictly worse than my initial proposal then as it increases the load on you and on the network (and makes the heuristics to find you are the originator of the tx much much worse).

No. Only bumping if the difference in the tx.feerate and the current estimatesmartfee is as large as our smallest feebump coin. Now that I think about it, it should be independent on the coinpool values, and instead be a threshold that reflects that the current tx.feerate is insufficient. For example, if the difference is large enough to push tx.feerate below the 50%-ile (or even 75%-ile) of the last 30 days.

I would rather fee-bump at any cost if we need to. ie if 1. we need to feebump for the next block 2. our feerate is less than the next block feerate 3. we still have a coin left. Then just feebump no matter if we overpay.

"if we need to"... I think the above method specifies this need.

Same, i think that it's just burning money (when you only have a fee-bumping wallet, every problem looks like a fee-paying one :p). I think we should always feebump to >= next block feerate (apart from the first hope-for-the-best broadcast) but not burning the entire reserve for no reason.

Think about this: 20 or 100 blocks have passed and all our previous "rationalised feebumping methods" failed. We now have 12 blocks left and the cancel has still not been mined. Why not use the full per-vault fee-reserve that was allocated to it? The purpose of the buffer is in part for unexpected failures, and we should optimise for pushing the cancel at this stage, not for saving moderate amounts on fees.

darosior commented 3 years ago

By this definition, isn't the target always 1? We should also consider that not all deployments would want to optimise for minimum fees on Cancels, but may want to prioritise business continuity (catching the invalid spend, and revaulting quickly to allow the correct spend to go through asap).

Woops. Typo, i meant max here obviously :). Yes, arguably trying to not burn too much fees is optimizing for security (:tm:) here and it should prime over the rest. Also, if there is a Cancel something is wrong in the first place so business continuity may not be the most important thing here?

Why make it a function of the CSV, it's irrelevant? "when the remaing lock-time is CSV/4, we forget about optimising fees and prioritise thwarting an invalid spend" isn't logical to me.

Yeah, i should have explicit it. In practice estimatesmartfee is going to give essentially the same values for small targets therefore if you can you'd like to start with a larger target (something >10 makes sense, something <=3 imho doesn't). A large CSV gives you this luxury.

I also think you misread the suggestion, it's a function of the remaining locktime not the CSV. In Python pseudocode:

if current_height >= start_height + CSV/4:
    if tx_feerate < next_block_feerate:
         feebump()

No. Only bumping if the difference in the tx.feerate and the current estimatesmartfee is as large as our smallest feebump coin. Now that I think about it, it should be independent on the coinpool values, and instead be a threshold that reflects that the current tx.feerate is insufficient. For example, if the difference is large enough to push tx.feerate below the 50%-ile (or even 75%-ile) of the last 30 days.

Yes, i know.. I meant "you are checking if you need to feebump at each block". I think doing this is more sensible:

if current_height >= start_height + CSV/4:
    if tx_feerate < next_block_feerate:
         feebump(estimatesmartfee(1))
elif never_feebumped:
    target = start_height + CSV/4 - current_height
    feebump(estimatesmartfee(target))

The purpose of the buffer is in part for unexpected failures, and we should optimise for pushing the cancel at this stage, not for saving moderate amounts on fees.

I don't care much to be honest, we can put it i just don't want us to say it's helping whatsoever. It's basically like saying "Bitcoin mining is centralized and pools get regulated, i'm going to solo mine with my laptop to help censorship resistance because it's the only thing i'm able to do".

JSwambo commented 3 years ago

Yes, arguably trying to not burn too much fees is optimizing for security (tm) here and it should prime over the rest. Also, if there is a Cancel something is wrong in the first place so business continuity may not be the most important thing here?

Hard to say without knowing the deployment context. If the feebump wallet amounts are negligible compared to the amounts being spent per vault, and the business depends on spending for their operations, I can easily see them prioritizing business continuity (which would require figuring out what went wrong, and who can still be trusted, before allowing managers to continue).

In practice estimatesmartfee is going to give essentially the same values for small targets therefore if you can you'd like to start with a larger target (something >10 makes sense, something <=3 imho doesn't). A large CSV gives you this luxury.

Ok, nice!

I also think you misread the suggestion, it's a function of the remaining locktime not the CSV.

I don't understand how so. remaining_locktime = (start_height + CSV) - current_height. Our choice of when to feebump intuitively depends on how many blocks left before the lock-time expires. The CSV is variable across deployments, but if we want to be consistent with the security properties of the enforcement of spend policies, our fee-bumping decisions should be consistent across deployments. Looking at this,

if current_height >= start_height + CSV/4:
    if tx_feerate < next_block_feerate:
         feebump(estimatesmartfee(1))

If one deployment has CSV = 12, and another has CSV = 144, then one starts to rush (target=1) at start_height + 3, with remaining_locktime=9 while the other deployment starts to rush at start_height + 36, with remaining_locktime=108.

If instead we say

if tx_feerate < next_block_feerate:
    if remaining_locktime >= 72:
        feebump(estimatesmartfee(10))
    elif remaining_locktime >= 24:
        feebump(estimatesmartfee(1))
    else:
        feebump(full_vault_reserve)

Then we have the same rush_height (and same panic time) across deployments. These values are exemplary only. But you can see with this approach it's easier to reason about longer targets as a function of remaining_locktime than of a variable parameter like CSV. It's a bit over simplified still as we could have rush_height be different across deployments, but still easier to reason about imo.

Going back to the point about different optimizations for different deployment use-cases, we may add another criterion: "parametrise our algorithm to enable optimisation for either finality or of operational costs". This helps us address the inherent trade-off by saying, 'you choose what matters to you'. So they set CSV, rush_height and PANIC_THRESHOLD, within ranges that we deem as rationally secure through our simulation analysis.

darosior commented 3 years ago

if tx_feerate < next_block_feerate:
    if remaining_locktime >= 72:
        feebump(estimatesmartfee(10))
    elif remaining_locktime >= 24:
        feebump(estimatesmartfee(1))
    else:
        feebump(full_vault_reserve)

Then you try generalising it and end up with my first message :p .

Then we have the same rush_height (and same panic time) across deployments.

You don't want to have the same actually, that was my point:

This is linear and less fancy than my original formula but seems a good tradeoff between 1. having a high CSV allows you to not spend too much in cancelation fee 2. having a high CSV leaves you a much larger window to fee-bump 3. not trying to optimise if you have an insanely low CSV (like 6).

With a larger CSV, you allocate 1/4 of its capacity to cost optimization and 3/4 to increasing your probability of getting confirmed. So yeah it's arbitrary and not very generalistic but we can probably tweak it..

Going back to the point about different optimizations for different deployment use-cases, we may add another criterion: "parametrise our algorithm to enable optimisation for either finality or of operational costs". This helps us address the inherent trade-off by saying, 'you choose what matters to you'. So they set CSV, rush_height and PANIC_THRESHOLD, within ranges that we deem as rationally secure through our simulation analysis.

ACK. So getting with:

if bypass_cost_optimization or current_height >= start_height + CSV/4:
    if tx_feerate < next_block_feerate:
         feebump()

Or even more generalistic, with a variable like COST_OPTI_FACTOR between 0 and 1 and something like

if current_height >= start_height + (CSV  / (1 + (COST_OPTI_FACTOR * CSV))) // 1:
    if tx_feerate < next_block_feerate:
         feebump()

darosior commented 3 years ago

The purpose of the buffer is in part for unexpected failures, and we should optimise for pushing the cancel at this stage, not for saving moderate amounts on fees.

I don't care much to be honest, we can put it i just don't want us to say it's helping whatsoever. It's basically like saying "Bitcoin mining is centralized and pools get regulated, i'm going to solo mine with my laptop to help censorship resistance because it's the only thing i'm able to do".

Also, after thinking more about this (and discussing it with @kloaec and @danielabrozzoni ) i don't think it's safe to implement. That's because the factor X between the prev_feerate and bumped_feerate may be too high in this case, and this may incentive a miner to censor the first transaction for triggering the "huge fee-bumping" logic. Given a probability P of finding the next block at any point in time, a miner has an incentive to always try to censor if X > 1/P. Given some of them have pretty large shares of hashrate control, 1/P == 5 seems realistic and for low fee periods (but large reserve) X == 5 seems realistic too.

darosior commented 3 years ago

For the "feebumping algorithm" we could go for something simpler.

The goal is to not overpay the fee for inclusion before the deadline. Overpaying fees increases the burden on the WT operator and indirectly increases risk by reducing the reserves too much while it's not necessary. The tradeoffs are a potential increased risk of not having the transaction be confirmed before the timelock maturation and a fund availability cost to the operations.

At each block that the Cancel transaction was not included in a block:

Let C be the current Cancel transaction feerate.
Let X be the number of remaining blocks before the timelock maturation.
Let Y be the minimum feerate increase according to the minimum fee addition according of BIP125 rule 4 (plus the non decrease of feerate).
Let the target feerate be the maximum of:
- estimatesmartfee with a block target of X / 36 (always in CONSERVATIVE mode)
- C + Y

This addresses the tradeoffs as for any short deadline (< 144) the estimate is approximately the same as if we always bumped with estimatesmartfee 1 at each block. This however doesn't really fulfill the goal of reducing the cost for larger deadlines, because we are always going to bump at any block anyways. To address this we need a more complex solution as described above.

JSwambo commented 2 years ago

Your latest proposal is ok. Interesting that it doesn't have any "panic mode" functionality. If the vault is high-value, it would be rational to use significantly more than the estimatesmartfee value (e.g. 3x more) to dominate the fee market. The potential problem we face with that is it being game-able by miners. With a short enough "panic mode" threshold (e.g. X < 12), and long enough time spent in the mempool (e.g. 72 blocks) we can hope that sufficient proportion of miners aren't trying to game the replacement fee-bump algorithm.

Counter-intuitively, it may be that having a "panic mode" decreases the likelihood of early confirmations with cancel transactions. So I'm starting to think something like you've proposed is ideal.

darosior commented 2 years ago

Kevin's input: waiting more might actually make us overpay more if we miss a low fee period at the time of broadcast because we underpaid.

This however doesn't really fulfill the goal of reducing the cost for larger deadlines, because we are always going to bump at any block anyways

Jacob points out that it's very unlikely to happen.

TL;DR: might not be worth the complexity.

JSwambo commented 2 years ago

Goals: Adapt to increasing feerate to ensure cancel confirmation.

Constraint: don't over pay on fees, don't create too much risk

Trade-off: overpayment <---> finality (risk & continuity)

Miner risk: avoiding processing cancels because replacement transactions will have significantly higher fees

Time-lock safety (CSV length) not necessarily how long managers want to wait for confirmation

Replace at each-block: sensitive to feerate spike & always incurrs replacement fee

EstimateSmartFee doesn't consider feerate trends, so waiting to replace exposes WT to feerate increases as well as decreases. It's not necessarily true that feerate will decrease, so waiting is not necessarily good. Waiting through high-volatility events could be beneficial if they can be accurately identified and there is still sufficient time before time-lock expiry.