stellar / stellar-core

Reference implementation for the peer-to-peer agent that manages the Stellar network.
https://www.stellar.org
Other
3.1k stars 968 forks source link

Add a flag to pause execution before catchup starts #4303

Closed ThomasBrady closed 1 week ago

ThomasBrady commented 2 months ago

Currently, when profiling catchup we need to manually connect tracy at just the right time to profile catchup and exclude the setup steps from the tracy capture (e.g. downloading of buckets, trimming of history etc.). It would be useful to have a flag (e.g. --pause-before-catchup) which pauses execution at the point when the node is ready to do catch up, so that we can connect the tracy client, then continue execution.

MonsieurNicolas commented 2 months ago

Doesn't Tracy allow you to delete data before a certain point (or only select a specific timeframe)? From the trace it's pretty easy to see when transaction application starts.

That being said, in the past what I did was just wait for tx application to start before triggering "perf" or other profilers, if I missed a few ledgers it didn't really matter as the trace was over a longer period of time anyways (to amplify trends).

ThomasBrady commented 2 months ago

Doesn't Tracy allow you to delete data before a certain point (or only select a specific timeframe)?

You can discard the entire trace while it is being captured, but its not clear to me how to do this for just a portion of the previously captured trace (either during a live capture or with a saved capture). That would be useful to cut down on RAM and disk usage, but it would still require active monitoring to ensure sensible resource usage.

From the trace it's pretty easy to see when transaction application starts. That being said, in the past what I did was just wait for tx application to start before triggering "perf" or other profilers, if I missed a few ledgers it didn't really matter as the trace was over a longer period of time anyways (to amplify trends).

Agreed its easy to see when the transaction application starts, but it is a manual process. I've found myself watching the logs/trace waiting to press reconnect/discard right as the txn application starts so that I can get the interesting data without filling up my RAM. I don't mind that I'm missing a few ledgers, its more that I have to watch ~10-20 mins of logs or capture and be ready to connect on time so that I don't miss the "interesting" work entirely.

MonsieurNicolas commented 2 months ago

oh I see -- you don't need to "wait for the right time" if you first run catchup X/0 (that will apply buckets and a few ledgers, relatively slow) followed by catchup Y (fast as it only needs to download checkpoints, the range you want to profile is X..Y)

marta-lokhova commented 1 week ago

Closing this, I think you should be able to get the same result via running two catchup commands.