paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.com/
1.89k stars 695 forks source link

[remote-externalities] Batch calls scraping child-tree data #2245

Open liamaharon opened 1 year ago

liamaharon commented 1 year ago

Child-tree data is currently fetched sequentially.

Most chains to date don't have that many child-trees so this hasn't been an issue, but the Rococo Contracts chain has almost 3000 child tries making this serial approach unnecessarily add a really long time to scraping keys.

The keys to scrape appear to all be known ahead of time, so we should be able to batch the requests.

This should be a pretty easy one. Good beginner issue with mentor available :)

brunopgalvao commented 1 year ago
  1. Would you say this is where I should be looking at?

https://github.com/paritytech/polkadot-sdk/blob/ea4085ab7448bb557a1558a25af164cf364e88d6/substrate/utils/frame/remote-externalities/src/lib.rs#L767-L775

  1. Can you recommend an easy way to test/run child trie remote externalities?

https://github.com/paritytech/polkadot-sdk/blob/ea4085ab7448bb557a1558a25af164cf364e88d6/substrate/utils/frame/remote-externalities/src/lib.rs#L1239-L1247

liamaharon commented 1 year ago

Would you say this is where I should be looking at?

Yes exactly

Can you recommend an easy way to test/run child trie remote externalities?

Add a test that scrapes live state similar to this: https://github.com/paritytech/polkadot-sdk/pull/1985/files#diff-f73dfe42c5919a5ebddb4df3294fea8e06d1b4dc0f0bba175a2c555223202cf5R1543-R1565

A remote with lots of child tries is wss://rococo-contracts-rpc.polkadot.io:443.

brunopgalvao commented 11 months ago

So I am following the rabbit and it appears to be grabbing the child keys here: https://github.com/paritytech/polkadot-sdk/blob/cfa19c37e60f094a2954ffa73b1da261c050f61f/substrate/utils/frame/remote-externalities/src/lib.rs#L800-L802

https://github.com/paritytech/polkadot-sdk/blob/cfa19c37e60f094a2954ffa73b1da261c050f61f/substrate/utils/frame/remote-externalities/src/lib.rs#L730-L743

which calls the self.backend.storage_keys() defined here: https://github.com/paritytech/polkadot-sdk/blob/cfa19c37e60f094a2954ffa73b1da261c050f61f/substrate/client/rpc/src/state/mod.rs#L423-L435

Two questions I have:

The keys to scrape appear to all be known ahead of time

  1. Where/how can I find these well known keys?
  2. How would I go about batching a request?

Put all the known keys into a vector and create a fn storage_keys_batched in the child state backend API: https://github.com/paritytech/polkadot-sdk/blob/cfa19c37e60f094a2954ffa73b1da261c050f61f/substrate/client/rpc/src/state/mod.rs#L350-L351

liamaharon commented 11 months ago
  1. Where/how can I find these well known keys?
 for prefixed_top_key in child_roots { 
    let child_keys = 
        Self::rpc_child_get_keys(&client, &prefixed_top_key, StorageKey(vec![]), at) 

They are in child_roots. Instead of calling the RPC sequentially here we want to do it all at once.

  1. How would I go about batching a request?

ChatGPT is surprisingly good at answering these types of question https://chat.openai.com/share/b607012e-3f78-4bcd-9bdc-c6cac06646fd

brunopgalvao commented 11 months ago

I have some code (https://github.com/paritytech/polkadot-sdk/pull/2584) and it compliles! Now I want to run the tests but when I run the tests locally or using wss://rococo-contracts-rpc.polkadot.io:443 or wss://rpc.polkadot.io:443 I get the following error:

Dec 04 01:34:52.398 ERROR remote-ext: Error = Transport(Server returned an error status code: 429) 

Do I need to be behind a VPN for the Polkadot RPC nodes to work?

Locally, I am using the contracts node:

./target/debug/substrate-contracts-node --rpc-max-request-size 10000000 --rpc-max-response-size 10000000
TEST_WS=ws://127.0.0.1:9944 cargo test --features=remote-test -p frame-remote-externalities -- --nocapture

Same happens on master branch.

ggwpez commented 9 months ago

What is the status of this @liamaharon @brunopgalvao ?

brunopgalvao commented 9 months ago

What is the status of this @liamaharon @brunopgalvao ?

I would very much like to continue working on this issue. I got an error here: https://github.com/paritytech/polkadot-sdk/issues/2245#issuecomment-1837936900

I could use some help unblocking what could be the issue there.

ggwpez commented 9 months ago

Does it with any of these nodes? https://gist.github.com/ggwpez/81db110fe4390ed9a7622f5857dfc4ff
They need to be configured in a special way, and the normal nodes may not be.

liamaharon commented 9 months ago

@brunopgalvao 429 means the server is rate limiting you. How many requests are you making in parallel? Limiting parallel requests and adding retry logic should resolve your issue.

brunopgalvao commented 9 months ago

Does it with any of these nodes? https://gist.github.com/ggwpez/81db110fe4390ed9a7622f5857dfc4ff They need to be configured in a special way, and the normal nodes may not be.

Yes. I have tried wss://rococo-try-runtime-node.parity-chains.parity.io:443

This is also happening on master for me:

polkadot-sdk % TEST_WS=wss://rococo-try-runtime-node.parity-chains.parity.io:443 cargo test --features=remote-test -p frame-remote-externalities -- --nocapture
   Compiling frame-remote-externalities v0.35.0 (/Users/bruno/src/polkadot-sdk/substrate/utils/frame/remote-externalities)
    Finished test [unoptimized + debuginfo] target(s) in 4.40s
     Running unittests src/lib.rs (target/debug/deps/frame_remote_externalities-c1e0e92aad840e1f)

running 13 tests
✅ Found 41 keys (1.49s)
✅ Found 1 keys (1.50s)
✅ Found 0 keys (1.51s)
test remote_tests::can_build_big_pallet ... ok
[00:00:00] ✅ Downloaded key values 4.8243/s [==================================================================================================================================] 1/1 (0s)
[00:00:00] ✅ Downloaded key values 93.605/s [================================================================================================================================] 41/41 (0s)
✅ Inserted keys into DB (0.00s)
✅ Inserted keys into DB (0.00s)
✅ Found 1383 keys (2.66s)
✅ Found 1383 keys (2.98s)
⠸     2.750 s   Scraping keys...░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
✅ Found 1383 keys (3.15s)
✅ Found 1383 keys (3.15s)
⠋     3.326 s   Scraping keys...░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
✅ Found 1383 keys (3.47s)
✅ Found 1383 keys (3.14s)
✅ Found 31 keys (1.23s)
[00:00:00] ✅ Downloaded key values 93.6132/s [===============================================================================================================================] 31/31 (0s)
✅ Inserted keys into DB (0.00s)
⠸     2.760 swnlScraping keys...████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
✅ Found 1383 keys (3.39s)
⠋    10.036 swnlScraping keys...Feb 09 13:58:41.698 ERROR remote-ext: batch processing failed: "Networking or low-level protocol error: Server returned an error status code: 429"    (6s)
thread 'remote_tests::can_create_child_snapshot' panicked at substrate/utils/frame/remote-externalities/src/lib.rs:1490:14:
called `Result::unwrap()` on an `Err` value: "batch processing failed"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
⠙    10.117 s   Scraping keys...test remote_tests::can_create_child_snapshot ... FAILED
✅ Loaded snapshot (0.01s)
✅ Loaded snapshot (0.01s)
test tests::can_exclude_from_snapshot ... ok
✅ Loaded snapshot (0.01s)
⠼    10.370 s   Scraping keys...test tests::can_load_state_snapshot ... ok
⠋    11.710 swnlScraping keys...Feb 09 13:58:43.335 ERROR remote-ext: Error while getting storage data: Networking or low-level protocol error: Server returned an error status code: 429    
thread 'remote_tests::can_build_one_small_pallet' panicked at substrate/utils/frame/remote-externalities/src/lib.rs:1428:14:
called `Result::unwrap()` on an `Err` value: "Error while getting storage data"
test remote_tests::can_build_one_small_pallet ... FAILED
⠹    11.873 swnlScraping keys...Feb 09 13:58:43.496 ERROR remote-ext: Error while getting storage data: Networking or low-level protocol error: Server returned an error status code: 429    
thread 'remote_tests::offline_else_online_works' panicked at substrate/utils/frame/remote-externalities/src/lib.rs:1389:14:
called `Result::unwrap()` on an `Err` value: "Error while getting storage data"
test remote_tests::offline_else_online_works ... FAILED
⠸    11.956 s   Scraping keys...Feb 09 13:58:43.603 ERROR remote-ext: Error while getting storage data: Networking or low-level protocol error: Server returned an error status code: 429 

@brunopgalvao 429 means the server is rate limiting you. How many requests are you making in parallel? Limiting parallel requests and adding retry logic should resolve your issue.

@liamaharon I am getting this issue on master as well. See above. Also using TEST_WS=ws://127.0.0.1:9944 cargo test --features=remote-test -p frame-remote-externalities -- --nocapturewith polkadot-sdk on master.

liamaharon commented 9 months ago

I wasn't able to replicate @brunopgalvao, maybe due to my further distance to the servers compared to you. Can you try with this branch? https://github.com/paritytech/polkadot-sdk/tree/liam-remote-ext-exponential-backoff

brunopgalvao commented 9 months ago

Got it to work!

The TEST_WS environment variable is only used in certain tests. I also needed to replace DEFAULT_HTTP_ENDPOINT with one of the nodes listed here https://github.com/paritytech/polkadot-sdk/issues/2245#issuecomment-1927322034: https://github.com/paritytech/polkadot-sdk/blob/b2c81b5800ffd5528dac26c2fc4059d43ed7404b/substrate/utils/frame/remote-externalities/src/lib.rs#L61

liamaharon commented 9 months ago

Ah nice catch. do you mind opening a pr to fix those tests? I would also be interested to know if my exponential backoff branch fixes the 429 errors :)

brunopgalvao commented 9 months ago

Ah nice catch. do you mind opening a pr to fix those tests? I would also be interested to know if my exponential backoff branch fixes the 429 errors :)

429 errors no longer happen (including your branch) as long as the nodes have a --rpc-max-request-size 10000000 --rpc-max-response-size set.

However, as noted some tests use the TEST_WS environment variable: https://github.com/paritytech/polkadot-sdk/blob/b2c81b5800ffd5528dac26c2fc4059d43ed7404b/substrate/utils/frame/remote-externalities/src/lib.rs#L1507-L1514

While other tests use the DEFAULT_HTTP_ENDPOINT hardcoded variable: https://github.com/paritytech/polkadot-sdk/blob/b2c81b5800ffd5528dac26c2fc4059d43ed7404b/substrate/utils/frame/remote-externalities/src/lib.rs#L1478-L1486

The default for transport is DEFAULT_HTTP_ENDPOINT: https://github.com/paritytech/polkadot-sdk/blob/edd95b3749754d2ed0c5738588e872c87be91624/substrate/utils/frame/remote-externalities/src/lib.rs#L256-L259

Does it make sense to refactor all tests to use TEST_WS instead of allowing some tests to fallback on the hardcoded DEFAULT_HTTP_ENDPOINT?

liamaharon commented 9 months ago

Does it make sense to refactor all tests to use TEST_WS instead of allowing some tests to fallback on the hardcoded DEFAULT_HTTP_ENDPOINT?

Yes I think so, PR would be good :)

brunopgalvao commented 9 months ago

Yes I think so, PR would be good :)

PR: https://github.com/paritytech/polkadot-sdk/pull/3284