stacks-network / sbtc-developer-release

sBTC primitives, signer components, helper tools
https://sbtc.tech
MIT License
1.98k stars 23 forks source link

warm up btc and stacks chains #129

Open EmbeddedAndroid opened 1 year ago

EmbeddedAndroid commented 1 year ago

After the stacks node starts, it seems to take about three minutes before syncing all the BTC blocks.

From devenv stacks node: INFO [1694696163.588890] [stackslib/src/burnchains/burnchain.rs:1495] [main] Syncing Bitcoin blocks: 0.5% (0 to 1 out of 200)

devenv stacks config: https://github.com/stacks-network/sbtc/blob/main/devenv/stacks/docker/Config.toml

From clarinet stacks node: Sep 14 14:16:39.609043 INFO Syncing Bitcoin blocks: 100.0% (100 to 101 out of 101)

This is working as designed, however clarinet is warming up both chains before deployment making it seem instant. This issue now is, do we need to replicate that feature here.

jcnelson commented 1 year ago

You may need to modify the Stacks node code to use a longer reward cycle in regtest mode. Try tweaking this: https://github.com/stacks-network/stacks-blockchain/blob/master/src/burnchains/mod.rs#L434. Make reward phases something like 200 blocks, and prepare phases something like 10 blocks.

EmbeddedAndroid commented 1 year ago

@jcnelson thank you, I'll give it a spin and let you know.

friedger commented 1 year ago

@jcnelson how does longer reward cycles improve the spin up of the stacks network? Is there a reason?

stjepangolemac commented 1 year ago

It might also be important to note that clarinet integrate regularly starts up much faster.

jcnelson commented 1 year ago

Does Clarinet run a full Stacks node and full Bitcoin node?

wileyj commented 1 year ago

what is the underlying hardware/OS being used, and how is the data being stored (i.e. ephemeral docker storage/bindmounted filesystem)?

The reason I ask is because if the underlying machine is an aarch64 Mac - i'm not aware of a fix to the slow docker IO due to how the virtual filesystem operates on that hardware. clarinet integrate i would have to see exactly how they're storing the data, but using ephemeral docker data will be faster than a bindmounted filesystem (substantially so when it's an aarch64 mac).

it's been a while since i looked into the mac aarch64 docker IO issue, but there were many open issues/discussions around that platform. I'm not aware of any fixes

EmbeddedAndroid commented 1 year ago

Does Clarinet run a full Stacks node and full Bitcoin node?

Yes, it runs both.

EmbeddedAndroid commented 1 year ago

what is the underlying hardware/OS being used, and how is the data being stored (i.e. ephemeral docker storage/bindmounted filesystem)?

ephemeral, developer machines atm, so Macbook M1/M2 and Intel desktop system.

jcnelson commented 1 year ago

@jcnelson how does longer reward cycles improve the spin up of the stacks network? Is there a reason?

The Stacks node employs a few heuristics when it finishes processing a reward cycle's Bitcoin blocks to deduce that it has the PoX anchor block. These heuristics take a few seconds, because some of them require waiting for the p2p block-discovery and block-downloader state machines to do a full pass without discovering or downloading anything new.

The Stacks devnet starts with 200 Bitcoin blocks, and the reward cycle length of 6 blocks means the Stacks node will do this 33 times. If each heuristic test takes 5 seconds, then bootup will take at least 2:45 minutes from this alone.

EDIT: Skimming the Clarinet code for comparison, it appears that this does not happen in Clarinet? In Clarinet, the Stacks node starts processing blocks at whatever the Bitcoin chain tip is, so it does not need to go through this bootstrapping step.

wileyj commented 1 year ago

what is the underlying hardware/OS being used, and how is the data being stored (i.e. ephemeral docker storage/bindmounted filesystem)?

ephemeral, developer machines atm, so Macbook M1/M2 and Intel desktop system.

interesting - when i was troubleshooting this issue in the past, ephemeral was less affected on aarch64, but it was still a bit slower (can't recall how much slower it was, but i do remember it being "a bit" slower).

I think i'll want to try this for myself on different hardware and see if anything stands out - but i have a suspicion you're hitting the same docker for mac M1/M2 issue i had a while ago. there was no fix at the time, and i have no heard that it has been resolved.

EmbeddedAndroid commented 1 year ago

what is the underlying hardware/OS being used, and how is the data being stored (i.e. ephemeral docker storage/bindmounted filesystem)?

ephemeral, developer machines atm, so Macbook M1/M2 and Intel desktop system.

interesting - when i was troubleshooting this issue in the past, ephemeral was less affected on aarch64, but it was still a bit slower (can't recall how much slower it was, but i do remember it being "a bit" slower).

I think i'll want to try this for myself on different hardware and see if anything stands out - but i have a suspicion you're hitting the same docker for mac M1/M2 issue i had a while ago. there was no fix at the time, and i have no heard that it has been resolved.

I'm also see this on my i9 intel desktop as well, but it does appear to be slower on the arm macbooks. I'm going to instrument the code today and setup through it to see if I can find some resolution.

wileyj commented 1 year ago

After some testing, this may be related to the number of containers being started as well as the resource requirements for the API. as a test to rule things out - i commented out the API/Postgres/Explorer containers in the docker-compose file, then started all the containers normally with ./up.sh .

The stacks instance came up rather quickly and started responding to RPC requests (after a few minutes, bitcoin reported height of 254, and the stacks instance mirrored that).

this is not the same issue i've seen with docker IO issues on mac - it may just be resource contention with how the API is storing the data in postgres (combined with the other additional containers needing cpu).

@EmbeddedAndroid said they'd take a look at clarinet a little further, to see if there's a missed env var etc. i may also amend my test a bit to bring up stacks, bitcoin, api, postgres, explorer to see if that makes a difference.

tl;dr - it looks like resource exhaustion during startup, but needs more time to confirm if that is the case here.

EmbeddedAndroid commented 1 year ago

https://github.com/hirosystems/clarinet/issues/225

This describes exactly what we are currently seeing, and explains an idea of how to speed in up. Looking into if these changes were implemented.

EmbeddedAndroid commented 1 year ago

After some testing, this may be related to the number of containers being started as well as the resource requirements for the API. as a test to rule things out - i commented out the API/Postgres/Explorer containers in the docker-compose file, then started all the containers normally with ./up.sh .

The stacks instance came up rather quickly and started responding to RPC requests (after a few minutes, bitcoin reported height of 254, and the stacks instance mirrored that).

this is not the same issue i've seen with docker IO issues on mac - it may just be resource contention with how the API is storing the data in postgres (combined with the other additional containers needing cpu).

@EmbeddedAndroid said they'd take a look at clarinet a little further, to see if there's a missed env var etc. i may also amend my test a bit to bring up stacks, bitcoin, api, postgres, explorer to see if that makes a difference.

tl;dr - it looks like resource exhaustion during startup, but needs more time to confirm if that is the case here.

If I remove all services, except bitcoin, stacks and the miner. I don't see any performance increase on my end, macbook or desktop.

EmbeddedAndroid commented 1 year ago

This is not a stacks issue, rather that clarinet is warming both chains up before deployment, that is why the sync seems instant, because it is.