SEQ-1 Lack of Garbage Collection Mechanism in DA light node mempool

SA124 commented 3 weeks ago

SEQ-1 Lack of Garbage Collection Mechanism in Mempool Auditor: Movebit Code: Aptos Core

Severity: Major Discovery Methods: Manual Review Status: Pending Code Location: protocol-units/da/m1/light-node/src/v1/sequencer.rs#19 Descriptions: A complete mempool should have a garbage collection mechanism, also known as a transaction aging mechanism, which periodically removes expired and unused transactions to prevent the mempool from growing indefinitely. Since Movement's mempool is based on RocksDB and stored on disk, the issue becomes more severe. In this case, a node crash or server restart will not result in the deletion of the old, expired transactions. Suggestion: It is recommended to establish a mempool garbage collection mechanism by referencing Aptos.

l-monninger commented 3 weeks ago

We've added GC since whenever this section was reviewed.

mzabaluev commented 3 weeks ago

@l-monninger Can you point at where GC is performed on light node's mempool? I suspect there is some confusion, but it may be on my part.

l-monninger commented 2 weeks ago

https://github.com/movementlabsxyz/movement/blob/31386f74a2700920fec093775c221acdc9c6c7a2/protocol-units/execution/opt-executor/src/transaction_pipe.rs#L112

l-monninger commented 2 weeks ago

@mzabaluev

mzabaluev commented 2 weeks ago

https://github.com/movementlabsxyz/movement/blob/31386f74a2700920fec093775c221acdc9c6c7a2/protocol-units/execution/opt-executor/src/transaction_pipe.rs#L112

That's the executor mempool. That was reported in #411 and I have closed it as implemented.

From my brief analysis, the mempool transactions should be drained by repeatedly calling pop_mempool_transactions: https://github.com/movementlabsxyz/movement/blob/31386f74a2700920fec093775c221acdc9c6c7a2/protocol-units/sequencing/memseq/sequencer/src/lib.rs#L84

@l-monninger Could any unused transactions accumulate if the light node operates correctly? Should we look out for a throughput bottleneck and spill transactions if they have not been "popped" for too long, essentially implementing another gate for load shedding?

l-monninger commented 2 weeks ago

Sorry, many threads...

Could any unused transactions accumulate if the light node operates correctly? Should we look out for a throughput bottleneck and spill transactions if they have not been "popped" for too long, essentially implementing another gate for load shedding?

It can be good idea to have something asynchronous that removes these unpopped transactions. I've actually never not had that in other projects. It is a bit of a question of transaction fairness though and it does open up some unique ~short-range mempool attacks.

mzabaluev commented 2 weeks ago

Should we use timestamps for GC then? Or rely on the default iteration order which the "pop" functions use anyway to build blocks?

poetyellow commented 2 weeks ago

1、Our audit was based on the 030fe011da8bd7eca75fa0a37197205767bef7ec commit, at which point there was no GC. 2、When we conduct the follow-up review, we will check whether the newly added GC mechanism has successfully resolved the issue.

l-monninger commented 3 days ago

@mzabaluev I think our solution here needs to be specced out a little.

mzabaluev commented 2 days ago

Fairness is an issue here, indeed: if it's possible for transactions to accumulate in DA light node's mempool at a rate per second that is greater than block_size / build_time, simply popping and spilling some transactions at intervals opens up a possibility of flushing attacks.

My current idea is to use the transaction's timestamp which is assigned at the point of ingress in the light node, and spill unsubmitted transactions after a timeout in GC rounds. Essentially a "transaction aging mechanism" as it is referred to in the description.

mzabaluev commented 2 days ago

We could also limit the number of "blobs in flight" in DA light node's sequencer module and drop overflowing transactions immediately on ingress, the same way we do it in the transaction pipe in the full node.

l-monninger commented 2 days ago

Yeah, so I was going to raise the flushing attack.

I don't think you need to add an additional timestamp to manage this, you should be able to already use the mempool transaction timestamp as this is set by the node.
I have concerns stemming from the DSA that we use here. If we have something that is linear or superlinear in the number of expired transactions, it is a decent chance for someone to perform a low-cost DOS attack. These transactions will not charge the user any gas fees, but they will be costly for the node to prune. Dropping immediately on ingress should prevent most of this, I believe. But, perhaps, I am overlooking something.

mzabaluev commented 2 days ago

I don't think you need to add an additional timestamp to manage this, you should be able to already use the mempool transaction timestamp as this is set by the node.

Yes, I've just found this out and updated the comment.

l-monninger commented 2 days ago

Assuming we limit the transaction ingress at memseq, this is also a very obvious DOS vector about which we need to be fairly sure we're comfortable making assumptions.

The current load-shedding at the beginning of the transaction ingress accounts for the entirety of ingress through to execution, and assumes validity via checking the sequence number. This itself needs to be adjusted, particularly given #490.

If we do change the way we shed load at the initial ingress in a way with which we are comfortable, we likely only push the issue of DOS via sending a bunch of spurious transactions down to this layer. Here again, a malicious user can perform a low-cost attack whereby their transactions are included in the mempool but unlikely charged in the executor.

mzabaluev commented 2 days ago

These transactions will not charge the user any gas fees, but they will be costly for the node to prune.

The transactions are ordered with the timestamp first in the key order, so pruning on time threshold can be implemented with the same amortized cost as regular popping.

mzabaluev commented 2 days ago

The current load-shedding at the beginning of the transaction ingress accounts for the entirety of ingress through to execution

Agreed. So the GC here should be considered as a backstop for the whole system not operating as expected, correct?

l-monninger commented 2 days ago

The transactions are ordered with the timestamp first in the key order, so pruning on time threshold can be implemented with the same amortized cost as regular popping.

Yeah, but that remove is $O(n)$ (each pop is $O(1)$) in RocksDB. I don't think by default you can truncate in constant time like you would a linked list.

l-monninger commented 2 days ago

Agreed. So the GC here should be considered as a backstop for the whole system not operating as expected, correct?

I think I actually mean the opposite. This more granular; it covers the cases where specifically the mempool is too far behind and eventually that would bleed out to the whole system.

mzabaluev commented 2 days ago

Yeah, but that remove is O(n) (each pop is $O(1)$) in RocksDB. I don't think by default you can truncate in constant time like you would a linked list.

If we assume that the transaction rate is limited at Aptos API ingress, this should not overburden the infrastructure more than processing valid transactions at the maximum allowed rate.

mzabaluev commented 2 days ago

@l-monninger So you think load-shedding on ingress limiting the total size of the mempool is also the way to go here? Is sizing of a RocksDb column family a O(1) operation, or should we maintain an in-memory counter? Should we still try to prune excessively old transactions on startup, or maybe just clean up the entire mempool?

l-monninger commented 1 day ago

@mzabaluev

If we assume that the transaction rate is limited at Aptos API ingress, this should not overburden the infrastructure more than processing valid transactions at the maximum allowed rate.

I guess mostly it can be thought of as a question of how well we know the throughputs of various components. But, we are also assuming here that there is nothing qualitatively different about these throughputs, i.e., they should be strictly colinear.

movementlabsxyz / movement

SEQ-1 Lack of Garbage Collection Mechanism in DA light node mempool #413