[suggestion] P2P module Enhancement

dusmart commented 1 year ago

Summary or problem description If bad guys sent lots of valid transactions on chain in priority-ascending order with more than ~~1000~~10000 transactions per second, the consensus can then hardly be reached for a very long time. If the attack details are required, please contact @superboyiii. He helped us for identifying this problem. We've tried our best to solve this while we did't get a good solution. Therefore, we issued a new enhancement request here to ask for core developer's solutions.

consensus module enhancement needed: TxNotFound will be a frequent reason for view changes, we'd better solve this issue first
p2p module enhancement needed: after the flood attack start, the p2p messages whether its high-priority or not, such as ChangeView, can not be sent to other nodes in time, therefore the consensus can not be reached

Do you have any solution you want to propose?

pack the transaction bodies in the PREPARE consensus message
limit p2p peers' in-flow throughput by some throttle methods

We have discussed the proposal with Roman Khimov privately and he didn't like these two solutions. Roman had other methods for mitigation.

Neo Version

Neo 3

Where in the software does this update applies to?

Consensus
P2P (TCP)

Jim8y commented 1 year ago

I have a general question here, how long exactly does it takes to sync 1000 txs across the entire neo network? How about 10,000 txs? what if attackers send each consensus node 1000 txs different valid txs, with different accounts from different IP addresses?

Before we get another sound solution, i vote for packing transaction bodies in the consensus message. And in the meanwhile, freeze the mempool if it is full.

dusmart commented 1 year ago

I have a general question here, how long exactly does it takes to sync 1000 txs across the entire neo network? How about 10,000 txs? what if attackers send each consensus node 1000 txs different valid txs, with different accounts from different IP addresses?

Before we get another sound solution, i vote for packing transaction bodies in the consensus message.

Sorry that I wrote a wrong number here. The attack speed should be 10k tx/s. If it's 5k tx/s, the network will work just fine. If the speed is much more larger, then the network will be worse. I'll send the result to you privately. Anyway, this problem should attract our attention.

roman-khimov commented 1 year ago

A bit more open discussion really can be helpful in this case, I'll reiterate on some points discussed previously.

First, I don't think you can avoid having an inconsistent (unsynchronized wrt mempool contents) network. Any throttling scheme can be subverted with enough senders (as in tx.signers[0]) and nodes (as in machines sending data to the network). So we have to deal with this inconsistency in one way or another.

Sending transaction contents with PrepareRequest is a solution, what I don't like about it is the associated cost it has. Correct me if I'm wrong, but I think dBFT tried to avoid doing exactly that since its inception. Testnet has a MaxBlockSize of 2M, most of this volume will be transactions, so if we're to try sending this large PrepareRequest it will take like 2s for every hop assuming 10Mbit/s. Of course you can have somewhat better connection, but you can also consider additional network stress from the transaction flow, either way I'd say this volume will be noticeable. Even in ideal case (synchronized mempools!) it'll cause some measurable block delay.

So if we're to reject the idea of including transactions into PrepareRequest (but hey, convince me it's fine!), we have two basic routes: either relying on CVs to handle the situation (they're a part of the normal protocol anyway) and trying to make the next proposal more suitable for everyone, or trying to avoid them.

In that sense #2058 (implemented in NeoGo) which tries to optimize the decision by using local data only can be extended to try taking into account data from other nodes in a way described in https://github.com/neo-project/neo/issues/2058#issuecomment-725634596 (hi, @vncoelho!). All nodes requesting CV send a reason for that. And if it's TxNotFound then they at least have got a proposal and they really can provide a bitmask of tx presence for that proposal.

The problem is that if the network is being stressed in a way that causes complete mempool replacements in time comparable with the regular block time this may lead to the transaction data being outdated by the time CV is to happen. That is the new primary may have none of the old transactions in its mempool when it's to make a proposal. This, btw, may defeat the original assumption of #2058 as well, rendering it ineffective.

We can try solving this problem before CV happens though, this requires getting transactions in the same round in some more reliable manner (but avoiding a full copy in PrepareRequest). CNs missing transactions from proposal do perform requests for them from the nearest neighbors which in many cases lead to successful block acceptance. But in case it doesn't we may try sending a new broadcasted RecoverTx message to the primary before CV timer expires. It can either be parameter-less or contain the same bitmask as mentioned above (so it'll be small).

This should be a high-priority message (so that it'd be more likely to propagate in time) that is processed by the primary only. Primary can wait for an f+1 set of these messages and then either broadcast a full set of transactions in a new message or just those that are missed on other CNs.

There can also be some heuristic like not sending RecoverTx if we're missing less than 20% of transactions or if the proposal contains less than 20% of MaxTransactionsPerBlock (numbers are pretty arbitrary here, this needs to be tested of course). RecoverTx may be sent after initial attempt to get transactions from neighbors like we send requests and wait 1/3 of the CV timer and if transactions are not there then we send RecoverTx and wait for the next 2/3 of the same timer.

This may keep the performance we have in the optimistic case (highly synchronized network) while allowing quicker recovery in case there is a problem. Even if we're to have some problems collecting transactions at view 0, the next one will have extended timers and thus higher chance to succeed.

vncoelho commented 1 year ago

Thanks for pinging me here. This is also a important discussion for the current PR #2818.

vncoelho commented 1 year ago

This should be a high-priority message (so that it'd be more likely to propagate in time) that is processed by the primary only. Primary can wait for an f+1 set of these messages and then either broadcast a full set of transactions in a new message or just those that are missed on other CNs.

This may be a good strategy for these edge cases, @roman-khimov.

We can also track these messages on the backups and extend it in a similar way we do with ExtendByFactor.

vncoelho commented 1 year ago

Regarding my worry on the Conflicts Attribute PR,in fact that can similarly happen nowadays with the ascending priority logic, as you highlighted: "causing complete mempool replacements".

However, differently, the attacker will eventually need to pay for those network fees and consequent system fees.

Jim8y commented 1 year ago

As @roman-khimov has stated, the packing transaction body would cause delayed consensus message processing. Thus, I propose a new solution below that does not require any change to the consensus:

The main reason for the attack is that attackers refresh CNs' mempools with ascending transactions after the primary issues a prepare message, causing the CNs to be busy requesting transactions (in the meanwhile, attackers keep sending new transactions to update CNs' mempools to slowdown CNs and cause network congestion).

My new solution would target the mempool refresh feature of this attack:

If full, mempool freezes itself and stops accepting new transactions, even if with higher transaction fees.
Only update mempool with transactions in the prepare message, here we can adopt the method proposed by @roman-khimov above.
for emergency transactions that need to be processed early, it needs to pay at least 10 times more fees than the highest fee in the last block, 10 GSA if the last block is empty.

focusing on the mempool would pose a minimal impact on existing neo protocols.

EdgeDLT commented 1 year ago

We can try solving this problem before CV happens though, this requires getting transactions in the same round in some more reliable manner (but avoiding a full copy in PrepareRequest).

Intuitively I agree, but it strikes me as a problem that will remain in some form as long as transaction selection remains within the purview of CNs. As @roman-khimov points out, unsynchronized mempools can't really be avoided.

I'd like to propose that we give some consideration to what is known in the Ethereum ecosystem as proposer-builder separation. Atomic transaction batches (or even full block bodies) can be constructed by trusted (e.g. non-CN committee members) or even untrusted nodes (à la dBFT 3.0). These can be validated and relayed to (or possibly by) CNs, who simply select the most profitable batches to use in proposed blocks.

I'm skirting over a lot of details, and a lot of the research on this topic is designed with Ethereum in mind, so would need a number of adjustments to work in our case. But there are various potential implications of this architectural adjustment that I think at least merits further investigation. To name a few:

Partially mitigate the issue of mempool flooding and improve scalability by optimizing resource use on the network (avoid more CVs).
Enable outside node incentivization (fees can be shared with bundlers/builders, i.e. non-CNs). This in turn can further mitigate the flood risk -- if you have good reason to relay thousands of valid transactions per second, why not become a bundler and profit from them instead?
MEV and censorship resistance (transactions can be privately shared with bundlers to avoid the traditional public mempool route). Not a silver bullet solution, but can reduce the attack surface considerably. See Flashbots and mev-boost for more details.
A step towards the possibility of having stateless validators, which would be especially beneficial in the long-term.
Solve the committee free-rider problem (we could require non-CN committee nodes to be builders/relayers and use it to confirm they are doing work). Currently, a committee member can be a light wallet, as I'm doing on TestNet. This threatens consensus stability and network throughput via CVs.

There's more, and I don't claim to be an expert, but I think there are some interesting potentials for us here.

vncoelho commented 1 year ago

@Liaojinghui

I believe that stop accepting transactions is critical because we have a logic on the P2P for requesting and receiving them, that would block the tx to be received later or we would need to do not add them on the cache. Perhaps, create another hash list to request them later.
This maybe bootleneck the process and what will the nodes do while not on the prepare message, reject all txs because of space being full?
That possibility of emergency transactions is great in several ways! I believe we can have a special space and list just for such kind of transactions. They are important even for updating committee and blockchain on-chain rules.

I see your steps as a possibility, yes. But I believe that exists quite an amount of efforts and tests for us to make that work safely as well.

vncoelho commented 1 year ago

I'd like to propose that we give some consideration to what is known in the Ethereum ecosystem as proposer-builder separation. Atomic transaction batches (or even full block bodies) can be constructed by trusted (e.g. non-CN committee members) or even untrusted nodes (à la dBFT 3.0). These can be validated and relayed to (or possibly by) CNs, who simply select the most profitable batches to use in proposed blocks.

That is great possibility for the future, @EdgeDLT. We can have the additional speaker as an untrusted node doing such kind of things, which, in my opinion, makes the decentralization of the consensus process more fair.

vncoelho commented 1 year ago

Economically speaking, this mempool issue has been here since the free txs. In the past, we created slots, which "solved" part of the problem of flooding the free txs. Along with it, we created the priorities and a parallel list that economically ordered txs (along with mining the hash of the tx).

In my opinion, the emergency txs pointed by @Liaojinghui solves the problem because the committee (from my perspective) should be a real-time online system. They should not discuss and take time to solve such situations. The committee should have standard procedures for that, which would trigger higher minimal networks fees.

In principle, it is GREAT to have the mempool full of txs! There are lots of fees to be collected. One of the issues is that fees are low at the moment of the attack.

Jim8y commented 1 year ago

I believe that stop accepting transactions is critical because we have a logic on the P2P for requesting and receiving them, that would block the tx to be received later, or we would need to not add them on the cache. Perhaps, create another hash list to request them later.

under the case when the mempool is already full, I think it is ok to halt accepting new transactions. The liveness and stability would definitely be more important than processing new transactions when we already have thousands of nonprocessed in the cache. Well, not elegant, though. I kind of like putting more strict limitations on the system.

dusmart commented 1 year ago

In principle, it is GREAT to have the mempool full of txs! There are lots of fees to be collected. One of the issues is that fees are low at the moment of the attack.

I have to reminder that not all of the txs in the mempool can be valid on chain even if they can fit in the mempool now. The case is that the bad guy can send tons of thousands of transactions to the network whose ValidUntilBlock is next block. Then every node's mempool will contain only 50k tx now(the guy should remain at least 50k GAS_per_tx in his wallet). While only 500 among them can be packed into the next block on mainnet now(5k on testnet). Therefore, only 500 GAS_per_tx is the real cost for such an experiment.

vncoelho commented 1 year ago

I did not mean system fees @dusmart Network fees are the ones responsible at this stage.

Jim8y commented 1 year ago

I have to reminder that not all of the txs in the mempool can be valid on chain even if they can fit in the mempool now. The case is that the bad guy can send tons of thousands of transactions to the network whose ValidUntilBlock is next block. Then every node's mempool will contain only 50k tx now(the guy should remain at least 50k GAS_per_tx in his wallet). While only 500 among them can be packed into the next block on mainnet now(5k on testnet). Therefore, only 500 GAS_per_tx is the real cost for such an experiment.

A much more reasonable and logical mempool, of course more complex, strategy then.

For instance:

500 at most for valid until next, clear the rest. 1000 at most for valid until 2 blocks later, clear the rest. no new txs if mempool already full unless 10 times more fee is paid.

roman-khimov commented 1 year ago

@Liaojinghui,

If full, mempool freezes itself and stops accepting new transactions, even if with higher transaction fees.

While this may seem like a fair behavior, I doubt it's viable:

What is a "mempool freeze" in a distributed network? Each node has a mempool of its own and can "freeze" with any state weakly related to the state of its neighbor. We can have radically different pools on this network, they will still have this problem.
People expect to be able to pay more for priority, they always had this possibility, so taking it out won't make them happy. Imagine a big ICO/mint rush happening, what will people say when they see OutOfMemory error from node without any possibility to handle it? They will just say "Neo is broken".
Economically CNs don't care about this, they want as much network fees as they can get.

no new txs if mempool already full unless 10 times more fee is paid.

However, some economic adjustments can help in avoiding transaction flood. It'll still be possible to cause some mempool difference with carefully planted nodes in different parts of the network, but at least there won't be a lot of traffic on the network, so it'll be easier to deal with it. IIRC now you only need to have bigger FeePerByte and given average transaction size of 250 this means paying 0.0000025 GAS more. 10× is just too much IMO, but if mempool eviction is to cost more than it costs now (like you need to pay 10% more) then the cost of the attack will quickly rise to unreasonable levels.

It can be done before reaching mempool capacity as well, have 50% full -> require 10% higher FeePerByte than the mempool average, have 70% full -> require 30% higher FeePerByte, 90% -> 50% higher. If it's tied to mempool average then even if someone to try pushing more and more, the cost will get higher and higher with each transaction.

focusing on the mempool would pose a minimal impact on existing neo protocols.

That's true, btw. This economic disincentivization should be easier to start with, maybe it'll even be enough for public networks (but some knobs will be needed for tests/private installations).

@EdgeDLT,

proposer-builder separation

That's an interesting thing in general, but it's more about the long-standing issue of missing economic motivation for running a Neo node (and some other things like potentially improving MEV situation). The problems here are:

potential performance impact both in terms of TPS, but even more in potential additional delays
no matter how batches are created all CNs still need to be synchronized about them
we better not send these batches with full transaction bodies (especially if anyone can propose), so there is the same problem of where to get a missing transaction from

@vncoelho,

possibility of emergency transactions

We've got HighPriorityAttribute that can be used by committee.

@dusmart,

While only 500 among them can be packed into the next block on mainnet now(5k on testnet).

Unrelated, but as I've said previously, mainnet should use the same 5K setting as well (it also somewhat mitigates low VUB problem and helps with transaction spikes in general, 500tx per 15s is 33TPS, 5000tx is 333TPS, easy as that).

Jim8y commented 1 year ago

@roman-khimov I basically agree with your solution and consider yours not much different than mine. when I say freeze the mempool, well just to lazy to consider a proper name, I mean stop processing low fee txs but require higher tx fees, even 10 times more fees. and mempool difference is not a big deal when we already have 50k txs in the cache, users will not see neo as broken, users will see neo as hot. But anyway, I agree with our proposal to increase the tx fees step by step.

vncoelho commented 1 year ago

it can be done before reaching mempool capacity as well, have 50% full -> require 10% higher FeePerByte than the mempool average, have 70% full -> require 30% higher FeePerByte, 90% -> 50% higher. If it's tied to mempool average then even if someone to try pushing more and more, the cost will get higher and higher with each transaction.

mempool varies across nodes while acceptance criteria should not. This link to mempool capacity can/may causes other problems.

roman-khimov commented 1 year ago

mempool varies across nodes while acceptance criteria should not. This link to mempool capacity can/may causes other problems.

I've thought about it. Yes, we still can't have perfect synchronization and different parts of the network will have some different sets of transactions potentially with different average FeePerByte. But is it a problem? I can run a node already that has a mempool of 10K. You can say my node is broken, but I can run with 100K as well, in any event, my node can have a different acceptance criteria even today. Also, my mempool may already have 50K transactions and I'll reject any new ones unless they pay more while some node may still have some 100 spare and will accept anything. Then all of the nodes are interconnected and constantly communicating with each other, so eventually I'd expect them to have some similar average FeePerByte, because if one node has 30% higher average it'll be able to push these pricey transactions to other nodes until their average is about the same.

If won't be perfect, it can't be, but it can work good enough for the purpose of preventing the flood. At least, we can try and this attempt won't cost a lot. Then RecoverTx will solve the other part of the problem.

neo-project / neo

[suggestion] P2P module Enhancement #2862