threefoldtecharchive / rivine

Blockchain technology for creating custom chains.
Apache License 2.0
22 stars 12 forks source link

better algorithm for output selection required #619

Closed GlenDC closed 4 years ago

GlenDC commented 4 years ago

The current algorithm used in all rivine (wallet) implementations does the following:

  1. order all outputs from biggest to smallest
  2. start selecting until the amount is reached
  3. if after all outputs are used it still doesn't work, continue for all unconfirmed
  4. if after all confirmed still not possible, funding fails

This works, but has several flaws:

Eventually combining all these flaws, a wallet could get in a position where the algorithm will always fail for big enough amounts to be sent.

Possible Solution

This reminds me a lot about the Knapsack algorithm, could help us perhaps. In general we could perhaps try to do something like the following:

Probably there are some other chains that had to solve similar problems, but we really need a solution ASAP, because some tfchain users run into the problem caused by this issue on a very regular basis.

By providing a better algorithm we should be able to avoid a growing list of UTXO.

For wallets already affected with a giant UTXO we might need to provide a manual-triggered merge operation perhaps.

GlenDC commented 4 years ago

Any input @robvanmieghem, @LeeSmet?

robvanmieghem commented 4 years ago

Sia implemented automated utxo merging for this since indeed, too many inputs in a tx makes it fail. This was especially true for exchanges. I don't like the Sia solution though.

GlenDC commented 4 years ago

Given only one coin output, how many coin inputs can we fit in a regular transaction, using one miner fee?

If you consult the code or documentation you'll be able to conclude that a v1 transaction with no coin inputs, one coin output, no arbitrary data, no block stake inputs or outputs and one miner fee (minimum), has a size of 358 bytes when Sia-binary encoded (the encoding used for v1 transactions).

Each added coin input will add 169 bytes to the transaction (it's as linear as this, given the length of coin inputs is a static 8 bytes, already included in the 358 bytes mentioned earlier).

The block size limit is in tfchain (and all other rivine chains) 2e6 bytes, so there won't be an issue there.

In the transaction pool module however there is another constant adhered by that module, which limits the max size a single transaction is allowed to have. This is 16e3 in tfchain (and all other rivine chains).

We can therefore conclude that for a merge transaction we can put ⌊(16e3 - 358) / 169⌋ = 92 coin inputs per transaction.

If we have two coin outputs (because we also have a refund coin output) we can instead only have up to:

⌊(16e3 - 409) / 169⌋ = 92 coin inputs per transaction.

This consistency is good, as it means we can as a measurement already add a check in our high level wallets that checks that there are no more than x coin inputs per transaction, irrelevant if there are one or two coin outputs, (more coin outputs are no used in regular high level wallet transactions, and less neither.

They will however also have to take into account the byte size of the arbitrary data, if for example 83 bytes are used for arbitrary data (the maximum) than we can have:

⌊(16e3 - 492) / 169⌋ = 91 coin inputs per transaction. It is however simple enough to check this extra size on the fly.

(EDIT: 17 bytes have to be added in case a LockTime is used for the first coin output.)

This means that in worst case the high level wallet application can provide a better error to the user, and in the best case scenario the wallet application can (semi-)automatically (help) resolve the issue.

On top of that it means that we should be able to quickly provide a merge functionality in the desktop wallet, to already help resolve people who are currently unable to spend high amounts.

The algorithm for the merging (or cleaning up of smaller outputs) could be as simple as:

Easiest might be to provide this as a scripted transaction, given this should really be more of an exceptional thing, and no longer be required once a decent coin output selection algorithm is used.

GlenDC commented 4 years ago

https://github.com/threefoldtech/threefold-wallet-electron/releases/tag/v0.6.1 released which:

// script constants const maxOutputsPerWalletCount = 32

// callback used to merge let txs = [] const merge_callback = (result) => { if (result) { if (!result.submitted) { // special case in case no transactions were submitted if (txs.length == 0) { return { output: 'No merge transactions have been submitted!' } }

        // get the total merged value
        let mergeTotal = new Currency()
        txs.forEach(tx => {
            tx.coin_outputs.forEach(co => {
                mergeTotal = mergeTotal.plus(co.value)
            })
        })

        // return the merge info
        let output = `Successfully merged ${mergeTotal.str({unit: true})} in ${txs.length} merge transaction(s). All Transactions: `
        output += txs.map(tx => tx.id).join(', ')
        return { output }
    }

    // add submitted transaction
    txs.push(result.transaction)
}

// send next merge transaction
const builder = wallet.transaction_new()
return builder.send({ merge: true, mergeMinOutputCount: maxOutputsPerWalletCount+1 }).then(merge_callback)

}

// merge & report return { output: merge_callback() }

}()



Start of next week I'll look into actually proposing and implementing a better algorithm to select coin outputs, but this should already help those who currently have wallets with too many coin outputs to still be useful for big amounts.
robvanmieghem commented 4 years ago

interesting and simple read: https://blog.lopp.net/the-challenges-of-optimizing-unspent-output-selection/

From the article :"It’s clear that there is no “one size fits all” solution to this problem" The question is here, which strategy avoids our problem and is simple enough withouth giving in too much on other downsides. Wallets can still decide to implement a different strategy then the reference one

robvanmieghem commented 4 years ago

Since most people do not use multiple addresses, let's drop the privacy concern of minimizing the number of addresses for now and focus on not bloating the blockchain or blocking the wallet. A simple suggestion:

This would limit the amount of utxo's and since transaction size has no impact on transaction fee yet, this algorithm has no downside there.

GlenDC commented 4 years ago

Very funny @robvanmieghem, I had the exact same approach in mind. As indeed we do not have the concerns that bitcoin wallets do have of having too optimise in terms of transaction fee. Nice to see how alike we think on this one.

GlenDC commented 4 years ago

I did have a twist in mind on top of this algorithm. I was thinking that in case we have more than the required amount, that it would be better to instead of going just over the limit, it would be better to actually go a lot over the limit. This would also prevent of having to create a very small coin output refund.

robvanmieghem commented 4 years ago

true, even though it would get cleaned up in the next spend

GlenDC commented 4 years ago

That is true as well. I guess the twist can be done in a second phase if desired.

GlenDC commented 4 years ago

Algorithm has been implemented in the unofficial Desktop Wallet. Release can be found @ https://github.com/threefoldtech/threefold-wallet-electron/releases/tag/v0.6.2.

Code for it can be found @ https://github.com/threefoldtech/threefold-wallet-electron/blob/440662a793d98781eb3bbf415ba8a482abed0288/src/tfchain/balance.py#L423-L497

Header and comment for the fund_individual function:

def _fund_individual(self, amount, addresses, max_input_count=None):
        """
        Funding logic works as follows...

        First collect all available outputs.

        Secondly try to fund using the lowest (up to the highest) confirmed outputs available.
        If we reach the limit and still didn't fund enough, try to replace from smallest to highest with
        the remaining available unconfirmed outputs.

        Lastly if we still couldn't fund, or perhaps we never reached the limit, we'll try to
        fund using unconfirmed, starting from highest to lowest, and overwriting as long as we have higher
        ones available to overwrite lower ones with.

        If this last step failed than funding is simply not possible.
        """

Implementation might still be simplified, but it seems to work from what I've reasoned and tested. Note that for a Rivine-go implementation the logic needs to be a bit more advanced. Specifically the estimation part of how many coin inputs can be used fo funding will require smarter logic. Depending on the exact contents of the transaction at the point, and how each property is encoded it will be able to estimate if done properly how many coin inputs can be used. In the desktop wallet this logic could be kept simple as it does not have to be as generic and the possibilities of what is supported is limited.

robvanmieghem commented 4 years ago

Closing this issue as there is a workaround, and created a follow up issue for the reference implementation: https://github.com/threefoldtech/rivine/issues/629