Why does Semux need the function PendingManager.processTransaction() ?

VictorECDSA commented 3 years ago

src/main/java/org/semux/core/PendingManager.java

    protected ProcessingResult processTransaction(Transaction tx, boolean isIncludedBefore, boolean isFromThisNode) {
         ......
    }

Because of the BlockGasLimit, it is possible for a block to fail to package all transactions in a tx pool.

Each time a block is added, let's assume that the block contains 300 transactions and the pool accumulates 1000 transactions due to receiving pressure.

The onBlockAdded() function triggers the 1000 transactions to be reexecuted, blocking block adding.

If the nodes are still receiving transactions at this point, the transactions will be packaged and consensused more slowly, and the tx pool will pile up transactions further, creating a vicious cycle.

    @Override
    public synchronized void onBlockAdded(Block block) {
            List<PendingTransaction> txs = reset();
            for (PendingTransaction tx : txs) {
                accepted += processTransaction(tx.transaction, true, false).accepted;
            }
    }

So, what is the necessity of doing this?

Thx!

semuxgo commented 3 years ago

Yes @fenghaoming, you're right!

The purpose of executing pending transactions in the pool is to exclude invalid transactions after the state change caused by importing a new block.

There is definitely a performance cost associated with this process and there are two possible workarounds:

To make the notifcation process asynchronous so the block importing will not be blocked;
To make the pending transaction validation lighter, for instance, only to check the sender's balance only.

Will definitely consider these options.

VictorECDSA commented 3 years ago

Yes @fenghaoming, you're right!

The purpose of executing pending transactions in the pool is to exclude invalid transactions after the state change caused by importing a new block.

There is definitely a performance cost associated with this process and there are two possible workarounds:

To make the notifcation process asynchronous so the block importing will not be blocked;

To make the pending transaction validation lighter, for instance, only to check the sender's balance only.

Will definitely consider these options.

First solution

I think the first asynchronous solution is still unreliable because all functions in the PendingManager need to be synchronized. Without blocking block adding, but blocking transactions receiving like AddTransaction. it still reduce the TPS. This solution just pass the problem on without really solving it.

Second solution

I think the second lighter scheme is reasonable, and this is how Ethereum deals with it. An example is as follows:

Example

Assume that an account currently has a nonce of 1 and a balance of 1000 ETH, and send four transactions as follows quickly and continuously (Nonce is null when user sends) :

The first transaction: GasPrice is 1 ETH, GasLimit is 700, and the actual GasUsed consumed by the contract is 400.
The second transaction: GasPrice is 1 ETH, GasLimit is 500, and the GasUsed actually consumed by the contract is 300.
The third transaction: GasPrice is 1 ETH, GasLimit is 400, and the actual GasUsed consumed by the contract is 300.
The fourth transaction: GasPrice is 1 ETH, GasLimit is 200, and the actual GasUsed consumed by the contract is 100.

According to Semux's current treatment:

The first three transactions were successfully added to the tx pool (1000 ETH is just enough for the GasUsed accumulated according to the pending transactions execution), while the fourth transaction immediately returned an "insufficient balance" error.

As Ethereum does:

All transactions are successfully added to the trading pool, and on subsequent block generation then:

The first and second transactions are successfully packaged in block N, with nonce of 1 and 2 respectively;
The third transaction was dropped due to insufficient balance (300 ETH in the new world state after executing the first two, and 400 ETH expected for this transaction);
The fourth transaction is demote into the queue with a nonce of 4 (because the third transaction occupies a nonce of 3).

After that, if the user initiates the following transaction:

Fifth transaction: GasPrice is 1 ETH, GasLimit is 100, and the actual GasUsed consumed by the contract is 50.
The fifth and fourth transactions are successfully packaged in blocks N + M with a nonce of 3 and 4 respectively (the fifth transaction reoccupies a nonce of 3).

Therefore

We can see that the removal of the pending transactions execution in the PendingManager only results in the following two disadvantages:

First, if the balance is sufficient to verify, it shall be in accordance with 'gasLimit', rather than the 'gasUsed' actually consumed by the contract as a reference. This requires users to have a sufficient understanding of the 'gasUsed' of the contract, so as to avoid transaction verification failure caused by setting too high 'gasLimit'.
Second, the balance check for each transaction is independent, so when a user sends multiple transactions in rapid succession, subsequent transactions cannot be checked for an "insufficient balance" error in real time.

These two disadvantages only affect the case of sending multiple transactions in rapid succession when the user has only a small balance. I think the disadvantages is worth while because it reduces the huge performance overhead of pending transactions execution.

Are there any other shortcomings I haven't considered? Thank you very much!

semuxgo commented 3 years ago

This is a thorough analysis!

I agree with the first conclusion but have some doubt about the second scenario.

The balance check should deduct the sender’s balance by the theoretical max cost and increase its nonce to the pending state after each added transaction. With a large number of transactions, they should be reorganized by the designated nonce and validated the same way. The transaction validation in pending pool may include false positives but should be sound.

Out of curiosity though, are you able to make a PR to improve the pending manager? :)

semuxgo commented 3 years ago

Also note that most of the account states should already be in cache and can be retrieved fairly fast.

VictorECDSA commented 3 years ago

This is a thorough analysis!

I agree with the first conclusion but have some doubt about the second scenario.

The balance check should deduct the sender’s balance by the theoretical max cost and increase its nonce to the pending state after each added transaction. With a large number of transactions, they should be reorganized by the designated nonce and validated the same way. The transaction validation in pending pool may include false positives but should be sound.

Out of curiosity though, are you able to make a PR to improve the pending manager? :)

May I ask what is your suspicion about the second scenario? Could you give me an example?

Do you want me to optimize the Pending Manager as I said above (remove mock execution, do not maintain a set of world states, just do simple "per transaction independent" balance checks)?

semuxgo commented 3 years ago

Let me clarify it a bit - my point was that we should not remove transaction execution, but simplify it instead.

When a new block is imported,

Create a new pendingState based on the current global state
Re-validate all transactions (assuming format and signature have been validated when received for the first time):
- Check transaction nonce:
- If it's greater than account nonce, move it to a secondary queue
- If it's less than account nonce, discard it immediately
- Check if value + fee + gasPrice * gasLimit is not less than account balance
- If not, discard it immediately
- If all checks pass,
- Increment the sender's account nonce in pendingState
- Deduct the sender's balance by value + fee + gasPrice * gasLimit
Done

The above process does not trigger any VM execution and only requires account state (cached with high probability), so it should be fast.

subsequent transactions cannot be checked for an "insufficient balance" error in real time

Essentially, transactions from the same sender are validated sequentially, so "insufficient balance" error should be immediately available.

I might have missed details, but feel free to add as you see fit. :)

VictorECDSA commented 3 years ago

Let me clarify it a bit - my point was that we should not remove transaction execution, but simplify it instead.

When a new block is imported,

Create a new pendingState based on the current global state

Re-validate all transactions (assuming format and signature have been validated when received for the first time):

Check transaction nonce:

If it's greater than account nonce, move it to a secondary queue

If it's less than account nonce, discard it immediately

Check if value + fee + gasPrice * gasLimit is not less than account balance

If not, discard it immediately

If all checks pass,

Increment the sender's account nonce in pendingState

Deduct the sender's balance by value + fee + gasPrice * gasLimit

Done

The above process does not trigger any VM execution and only requires account state (cached with high probability), so it should be fast.

subsequent transactions cannot be checked for an "insufficient balance" error in real time

Essentially, transactions from the same sender are validated sequentially, so "insufficient balance" error should be immediately available.

I might have missed details, but feel free to add as you see fit. :)

I agree with you on the whole, but I have a different opinion on one point I don't think it is necessary to maintain the balance of the account in pendingState, so I don't need to perform the operation "Deduct the sender's balance by value + fee + gasPrice * gasLimit".

Example:

For example, there are 4 transactions from the same sender with a balance of 1000 ETH, each of which has a GasPrice of 1 ETH, a GasLimit of 500, but the actual execution consumes a GasUsed of 300

According to your approach:

The first two transactions successfully joined the PendingPool, and the last two transactions reported an error of "insufficient balance" in real time.

My view:

All four transactions were successfully added to the PendingPool because "500 is less than 1000" (where each transaction is "independently" compared with the account balance by GasLimit), and finally the first three transactions successfully packaged into the block (use the "cumulative" GasUsed to compare when the block generating) and the last transaction is discarded.

So:

I think your approach requires the user not to set the GasLimit too high (much higher than GasUsed) when sending a transaction, otherwise subsequent transactions from that user could easily check for error "insufficient balance" unreasonably.

semuxgo commented 3 years ago

Yes, your example is absolutely correct. The proposal is an optimistic approach and may falsely reject a transaction even if the sender has enough balance to cover the actual transaction cost. I think it only affects users who deliberately wants to spend all the available funds. In this case, they can potentially lower the gasLimit to work around it. Right?

VictorECDSA commented 3 years ago

Yes, your example is absolutely correct. The proposal is an optimistic approach and may falsely reject a transaction even if the sender has enough balance to cover the actual transaction cost. I think it only affects users who deliberately wants to spend all the available funds. In this case, they can potentially lower the gasLimit to work around it. Right?

I stand in the user's position to think about this problem.When I send a transaction, I don't know how much gas it actually consumes, so I usually try to set the gas as high as possible to avoid the error of outOfGas and wasting my ETH.This is just my personal opinion, do not know right?

Another reason I don't want to continue to maintain user balances in "onBlockAdded()" is that it still needs to read every transaction in the trade pool to operate, even if it no longer penetrates into the VM layer, but the time complexity is still O(n).

I think the trade pool should be maintained in both account and nonce dimensions.

public class PendingManager {
    private AccountState accountState;

    // map< address, accTxs >
    Map<ByteArray, AccountTransactions> txs = new HashMap<>();

    public synchronized void onBlockAdded(Block block) {
        // reset accountState
        accountState = facade.getBlockchain().getAccountState().track();

        // retrieve max nonce of each account in given block
        Map<ByteArray, Long> accountMaxNonce = new HashMap<>();
        for (Transaction tx : block.getTransactions()) {
            ByteArray address = ByteArray.of(tx.getFrom());
            long nonce = tx.getNonce();
            accountMaxNonce.compute(address, (k, v) -> v == null || v < nonce ? nonce : v);
        }

        for (Map.Entry<ByteArray, Long> e : accountMaxNonce.entrySet()) {
            ByteArray address = e.getKey();
            AccountTransactions accTxs = txs.get(address);

            //remove txs whose nonce is outdated
            accTxs.forward(e.getValue());

            //remove first pending whose balance is insufficient and demote subsequent txs
            Amount available = accountState.getAccount(address.getData()).getAvailable();
            accTxs.filter(available);

            //release memory
            if (accTxs.size() == 0) {
                txs.remove(address);
            }
        }
    }

public class AccountTransactions {
    // map< nonce, tx >, continuous nonce which is ready, and pendingNonceMin equals to accountNonce in DB
    private final Map<Long, Transaction> pending = new HashMap<>();
    private Long pendingNonceMin;
    private Long pendingNonceMax;

    // map< nonce, tx >, large nonce which greater than (pendingNonceMax + 1)
    Map<Long, Transaction> queue = new HashMap<>();

    public int size() {
        return pending.size() + queue.size();
    }

    // Remove transactions whose nonce less or equal given nonce
    public void forward(long nonce) {
        while (pendingNonceMin != null && pendingNonceMin <= nonce) {
            pending.remove(pendingNonceMin);
            if (pending.get(pendingNonceMin + 1) != null) {
                pendingNonceMin++;
            } else {
                pendingNonceMin = null;
                pendingNonceMax = null;
            }
        }
    }

    // Remove the first transaction with an insufficient balance, and demote subsequent transactions from pending to queue
    public void filter(Amount available) {
        if (pendingNonceMin == null) {
            return;
        }
        Transaction pendingMin = pending.get(pendingNonceMin);
        if (available.greaterThenOrEqual(pendingMin.value.add(pendingMin.fee).add(pendingMin.gasPrice.multiply(pendingMin.gas)))) {
            return; // available is enough
        }

        // remove first
        pending.remove(pendingNonceMin);

        // demote subsequent
        for (Map.Entry<Long, Transaction> e : pending.entrySet()) {
            queue.put(e.getKey(), e.getValue());
        }

        // clear pending
        pending.clear();
        pendingNonceMin = null;
        pendingNonceMax = null;
    }

So we're going to do it this way, the number of onBlockAdded transactions that need to be processed is related only to the number of transactions in the block, not to the number of transactions pending.

semuxgo commented 3 years ago

Another reason I don't want to continue to maintain user balances in "onBlockAdded()" is that it still needs to read every transaction in the trade pool to operate, even if it no longer penetrates into the VM layer, but the time complexity is still O(n).

We can simply sort the transactions based on the gasPrice and limit the number of transactions that are actively evluated for pending block. So the complexity can be limited to O(1). The execute time is not wasted as it will spend up the block creation process, given most of the account states are hot.

That said, I do like the idea about grouping transactions by the sender's address. I will explore this a bit more.

semuxgo commented 3 years ago

Aso, regarding:

I stand in the user's position to think about this problem.When I send a transaction, I don't know how much gas it actually consumes, so I usually try to set the gas as high as possible to avoid the error of outOfGas and wasting my ETH.This is just my personal opinion, do not know right?

I don't have data about different user behaviors so I'm going to guess that some people might prefer to set the gasLimit as high as possible. Let's say he/she uses 5,000,000 as gas limit and the default 10 NanoSEM = 10 Gwei gas price. The max cost is 0.05 SEM. Any account holder with reasonably enough balance should be able to do transactions using their prefered gas limit.

VictorECDSA commented 3 years ago

We can simply sort the transactions based on the gasPrice and limit the number of transactions that are actively evluated for pending block. So the complexity can be limited to O(1).

If there 10,000 txs in txPool, but only 1000 txs that are actively evluated for pending block. You maintain user balances in txPool only with the 1000 txs. Here comes another new one tx. In my opinion, we should judge whether the balance of this transaction is sufficient by deducting the cost of those 10,000 transactions. Maintaining only those 1000 transactions makes no sense on its own.

The execute time is not wasted as it will spend up the block creation process, given most of the account states are hot.

I'm sorry, I don't understand how it will spend up the block creation process. Could you give an example? Thank you!

semuxproject / semux-core