stacks-network / stacks-core

The Stacks blockchain implementation
https://docs.stacks.co
GNU General Public License v3.0
3.01k stars 667 forks source link

Requesting map/data-var/estimate-fee timeout #4931

Closed bestmike007 closed 2 months ago

bestmike007 commented 3 months ago

Describe the bug We're seeing timeouts when requesting map/data-var/estimate-fee after upgrading from 2.5.0.0.3 to 2.5.0.0.4.

Steps To Reproduce

  1. Run a stacks-node
  2. Run this script: https://gist.github.com/bestmike007/d77b582bea2fda1d4b2af272b8e42fba

On a 2.5.0.0.4 node, it fails with timeout after a few minutes, while on 2.5.0.0.3, it runs smoothly and the CPU usage is more stable.

image image

Expected behavior It should be as stable as it was.

Environment (please complete the following information):

Additional context Not seeing any error logs

jcnelson commented 3 months ago

Hey! We tried reproducing this today and were unable to get the 2.5.0.0.4 to go more slowly. It's probably because our deployments are different than yours. Can you tell us more about how this node was set up to run when you tested it? For example, did it have a public IP? Was it running in a particular cloud provider? Stuff like that.

bestmike007 commented 3 months ago

I was running the test on a DigitalOcean droplet with 8vCPU Premium Intel, 16G memory, and 480G local nvme SSD. It does have a public IP with ports 20443, 20444 open (but there was no other request logs when the test was running).

It has a stacks node api version 7.11.1 running, configured as the event observer.

Both stacks node and api are running with docker.

I'm observing timeouts after upgrading both the stacks-node and the api (previously 7.11.0-beta.1) on the same droplet. This is the first node I tried to upgrade and other nodes work fine.

Let me do this on a fresh new node with a recent snapshot of yours without api sidecar, to see if it is still reproducible.

bestmike007 commented 3 months ago

Here's what I did:

  1. Spin up a new cloud vm: Debian 12 x64, 4vCPU Intel, 16GB memory, and 320GB nvme SSD
  2. Restore from the snapshot: curl https://archive.hiro.so/mainnet/stacks-blockchain/mainnet-stacks-blockchain-2.5.0.0.4-latest.tar.gz | tar -zxv
  3. Start stacks node with docker-compose and wait for it to catch up
  4. Run the script: https://gist.github.com/bestmike007/d77b582bea2fda1d4b2af272b8e42fba
services:
  stacks-core:
    restart: always
    image: blockstack/stacks-core:2.5.0.0.4
    container_name: stacks_node
    command: stacks-node start --config /srv/Stacks.toml
    network_mode: host
    environment:
      NOP_BLOCKSTACK_DEBUG: 0
      XBLOCKSTACK_DEBUG: 0
      RUST_BACKTRACE: 0
    volumes:
      - ./Stacks.toml:/srv/Stacks.toml:ro
      - ./mainnet:/srv/stacks-node/mainnet
[node]
working_dir = "/srv/stacks-node"
rpc_bind = "0.0.0.0:20443"
p2p_bind = "0.0.0.0:20444"
bootstrap_node = "02196f005965cebe6ddc3901b7b1cc1aa7a88f305bb8c5893456b8f9a605923893@seed.mainnet.hiro.so:20444,02539449ad94e6e6392d8c1deb2b4e61f80ae2a18964349bc14336d8b903c46a8c@cet.stacksnodes.org:20444,02ececc8ce79b8adf813f13a0255f8ae58d4357309ba0cedd523d9f1a306fcfb79@sgt.stacksnodes.org:20444,0303144ba518fe7a0fb56a8a7d488f950307a4330f146e1e1458fc63fb33defe96@est.stacksnodes.org:20444"
wait_time_for_microblocks = 10000

[burnchain]
chain = "bitcoin"
mode = "mainnet"
peer_host = "bitcoin.mainnet.stacks.org"
username = "stacks"
password = "foundation"
rpc_port = 8332
peer_port = 8333

First run failed with timeout after 80s:

image

Second run failed after 471s:

image

Third run failed after 493s:

image

In the mean time I was running the exact same script on an old node with version 2.5.0.0.3, and it is still running without any issue.

bestmike007 commented 3 months ago

Update: the script on 2.5.0.0.3 is still running without timeouts.

@jcnelson This is a new cloud vm in isolated vpc, so lmk if you need access to it, otherwise I'll tear it down.

bestmike007 commented 2 months ago

Looks like it's fixed in 2.5.0.0.5. Looks like the default antientropy_retry config is somehow related: https://github.com/stacks-network/stacks-core/compare/2.5.0.0.4...2.5.0.0.5

bestmike007 commented 2 months ago

The script has been running for hours, no more failures. I'm closing this issue.