paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.85k stars 675 forks source link

Compress the state response to reduce the state sync data transfer #5312

Open liuchengxu opened 2 months ago

liuchengxu commented 2 months ago

Is there an existing issue?

Experiencing problems? Have you tried our Stack Exchange first?

Motivation

The state syncing could download several GiB of data if the state size of the chain is huge, which is not uncommon nowadays. This poses a significant challenge for nodes with slow network connections. Additionally, since state sync currently lacks a persistence feature (#4), any network disruption forces the node to re-download the entire state, which is annoying.

Request

Reduce the state syncing download size.

Solution

Compress the state response before sending it to the node and uncompress the state response on the receiver side.

diff --git a/substrate/client/network/sync/Cargo.toml b/substrate/client/network/sync/Cargo.toml
index 17e3e2119d..047fffa31f 100644
--- a/substrate/client/network/sync/Cargo.toml
+++ b/substrate/client/network/sync/Cargo.toml
@@ -48,6 +48,7 @@ sp-consensus = { workspace = true, default-features = true }
 sp-core = { workspace = true, default-features = true }
 sp-consensus-grandpa = { workspace = true, default-features = true }
 sp-runtime = { workspace = true, default-features = true }
+zstd = { workspace = true }

 [dev-dependencies]
 mockall = { workspace = true }
diff --git a/substrate/client/network/sync/src/engine.rs b/substrate/client/network/sync/src/engine.rs
index bb6e7a98a8..3915d3845e 100644
--- a/substrate/client/network/sync/src/engine.rs
+++ b/substrate/client/network/sync/src/engine.rs
@@ -1204,7 +1204,10 @@ where
        }

        fn decode_state_response(response: &[u8]) -> Result<OpaqueStateResponse, String> {
-               let response = StateResponse::decode(response)
+               let response = zstd::stream::decode_all(response).expect("Failed to uncompress state response");
+               let response = StateResponse::decode(response.as_slice())
                        .map_err(|error| format!("Failed to decode state response: {error}"))?;

                Ok(OpaqueStateResponse(Box::new(response)))
diff --git a/substrate/client/network/sync/src/state_request_handler.rs b/substrate/client/network/sync/src/state_request_handler.rs
index 0e713626ec..bb07bdd9bc 100644
--- a/substrate/client/network/sync/src/state_request_handler.rs
+++ b/substrate/client/network/sync/src/state_request_handler.rs
@@ -264,7 +272,15 @@ where

                        let mut data = Vec::with_capacity(response.encoded_len());
                        response.encode(&mut data)?;
-                       Ok(data)
+                       let compressed_data = zstd::stream::encode_all(data.as_slice(), 0).expect("Failed to compress state response");
+                       Ok(compressed_data)
                } else {
                        Err(())
                };

This is a low-hanging fruit that can reduce the state sync data significantly as demonstrated by my local experiments. I conducted state sync tests at various block heights (before height 300000) using both the fast and fast-unsafe modes for subcoin, the Uncompressed Total State Sync Data is calculated as sum(data.len()), the Compressed Total State Sync Data is calculated as sum(compressed_data.len()). The results are promising, indicating that several GiB of state sync data can be saved, especially when dealing with large chain states. The final state size of subcoin may be 12+GiB, this optimization will greatly help the state sync of subcoin.

--sync Uncompressed Total State Sync Data (bytes) Compressed Total State Sync Data (bytes) Compressed/Uncompressed
fast-unsafe 149,517,161 50,284,623 0.34
fast-unsafe 205,400,559 70,742,393 0.34
fast-unsafe 597,683,313 202,993,329 0.34
fast-unsafe 1,239,830,694 480,632,754 0.39
fast-unsafe 2,182,810,408 841,870,855 0.39
fast 820,180,264 338,889,711 0.41
fast 1,486,307,891 631,430,018 0.42

We can make this configurable if necessary.

Are you willing to help with this request?

Yes!

burdges commented 2 months ago

It's odd this data is compressible, well maybe our underlying formats needs some redesign there.

We've a few places where it'd be nice if we used libtorrent or some rust rewrite.

liuchengxu commented 2 months ago

I'm not surprised that the data is compressible as the state response is produced by reading the state sequentially, which means many entries share the same storage prefix.

bkchr commented 2 months ago

I think the idea is good, if it works like you say. Would be really nice if you can provide some numbers for Polkadot as well.

This would require a RFC, because we need to change the networking messages.

liuchengxu commented 2 months ago

I only ran the node for a while and didn't sync to the recent blocks, here are the numbers for Polkadot:

--sync Block Height Compressed Data Size Uncompressed Data Size Compressed/ Uncompressed
warp 442,867 2,029,088 6,601,871 0.31
warp 850,945 3,448,748 12,578,916 0.27
warp 1,191,454 5,997,874 21,447,825 0.28
warp 1,508,438 9,133,749 28,606,326 0.32
warp 1,857,024 17,016,693 51,222,672 0.33

UPD: The result of warp sync to the recent blocks of Polkadot: Compressed/Uncompressed = 351179285 / 528800495 = 0.66, we can save 169 MiB out of 504MiB if compressing the state response message.

2024-08-13 22:25:09 ⚙️  State sync, Downloading state, 84%, 676.58 Mib (7 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 745.1kiB/s ⬆ 2.4kiB/s
2024-08-13 22:25:09 ================ total_bytes_compressed: 349292448, total_bytes_uncompressed: 526850326
2024-08-13 22:25:12 ================ total_bytes_compressed: 351179285, total_bytes_uncompressed: 528800495
2024-08-13 22:25:14 ⚙️  State sync, Importing state, 84%, 680.50 Mib (8 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 461.8kiB/s ⬆ 1.3kiB/s
2024-08-13 22:25:19 ⚙️  State sync, Importing state, 84%, 680.50 Mib (8 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 69.9kiB/s ⬆ 5.1kiB/s
2024-08-13 22:25:24 ⚙️  State sync, Importing state, 84%, 680.50 Mib (8 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 63.6kiB/s ⬆ 4.6kiB/s
2024-08-13 22:25:29 ⚙️  State sync, Importing state, 84%, 680.50 Mib (8 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 57.3kiB/s ⬆ 1.9kiB/s
2024-08-13 22:25:34 ⚙️  State sync, Importing state, 84%, 680.50 Mib (9 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 55.5kiB/s ⬆ 1.1kiB/s
2024-08-13 22:25:39 ⚙️  State sync, Importing state, 84%, 680.50 Mib (9 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 59.8kiB/s ⬆ 3.3kiB/s
2024-08-13 22:25:44 ⚙️  State sync, Importing state, 84%, 680.50 Mib (9 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 89.0kiB/s ⬆ 5.7kiB/s
2024-08-13 22:25:49 ⚙️  State sync, Importing state, 84%, 680.50 Mib (9 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 59.6kiB/s ⬆ 0.7kiB/s
2024-08-13 22:25:54 ⚙️  State sync, Importing state, 84%, 680.50 Mib (9 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 26.9kiB/s ⬆ 1.2kiB/s
2024-08-13 22:25:59 ⚙️  State sync, Importing state, 84%, 680.50 Mib (9 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 46.1kiB/s ⬆ 2.3kiB/s
2024-08-13 22:26:01 State sync is complete, continuing with block sync.

total_bytes_compressed is calculated as sum(compressed_size), total_bytes_uncompressed is calculated as sum(uncompressed_size).

diff --git a/substrate/client/network/sync/src/engine.rs b/substrate/client/network/sync/src/engine.rs
index ee7576c22f1..a7ab4a92385 100644
--- a/substrate/client/network/sync/src/engine.rs
+++ b/substrate/client/network/sync/src/engine.rs
@@ -1207,11 +1212,14 @@ where
                Ok(request.encode_to_vec())
        }

-       fn decode_state_response(response: &[u8]) -> Result<OpaqueStateResponse, String> {
+       fn decode_state_response(response: &[u8]) -> Result<(usize, usize, OpaqueStateResponse), String> {
+               let compressed_data = zstd::stream::encode_all(response, 0).expect("Failed to compress state response");
+               let compressed_size = compressed_data.len();
+               let uncompressed_size = response.len();
                let response = StateResponse::decode(response)
                        .map_err(|error| format!("Failed to decode state response: {error}"))?;

-               Ok(OpaqueStateResponse(Box::new(response)))
+               Ok((compressed_size, uncompressed_size, OpaqueStateResponse(Box::new(response))))
        }
liuchengxu commented 2 months ago

Opened RFC https://github.com/polkadot-fellows/RFCs/pull/112