Closed wileyj closed 1 year ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
bump
history: because of how the explorer works, it can potentially track millions of paths for latency and other metrics. since we're using a pull based metrics model, every n seconds when we pull the metrics from the explorer the data we're pulling can be an extremely large dataset. what was happening was when metrics were enabled for the explorer, every n seconds the explorer latency would spike while the metrics were pulled. Not to mention the latency measurements are essentially unusable since each path is different.
to resolve this for the API, @zone117x did some work to store similar paths into dedicated buckets.
for example, all /block/xxxx
requests would be stored in a blocks
bucket, etc.
I'll look for the PR in that repo that enabled this and link here shortly.
Thanks @wileyj 🙏 Assigning to @aulneau so he can provide some input as well for anyone who might tackle this issue.
I can start looking at this if @aulneau can provide a bit more context?
Sure -- so we need to do a few things.
We are no longer using the custom server found here: https://github.com/blockstack/explorer/blob/main/server/index.js
We need to start using it again, and it needs to be updated in a few ways:
Basically these changes make it so that all types of paths put their metrics into a single respective bucket, meaning rather than having a single metrics entry per address that is navigated to, we have all metrics combined into a single address bucket.
@zone117x might be able to provide further guidance
The API PR first implementing the route condensing is actually https://github.com/blockstack/stacks-blockchain-api/pull/412
with the promster library - they have some defaults set that are capping our data. https://github.com/tdeekens/promster
for the sample metric, http_request_duration_seconds_bucket
they clearly state this in the readme:
http_request_duration_seconds: a Prometheus histogram with request time buckets in milliseconds (defaults to [ 0.05, 0.1, 0.3, 0.5, 0.8, 1, 1.5, 2, 3, 5, 10 ])
which corresponds with the data we've seen. It appears it's possible to modify this, but would require a code change (in the API and any other repo where these are not the defaults we want): https://github.com/tdeekens/promster/blob/main/packages/metrics/src/create-metric-types/create-metric-types.ts#L173-L184
for the stacks-blockchain-api repo, this was addressed in this commit: https://github.com/blockstack/stacks-blockchain-api/commit/735874e45c1e198724e7d01ca9e4eec4d108706c
@wileyj Should I assign this issue to you? I'm not exactly sure where it stands in relation to the changes made by @fbwoolf with https://github.com/blockstack/explorer/pull/491
@wileyj Should I assign this issue to you? I'm not exactly sure where it stands in relation to the changes made by @fbwoolf with #491
That PR was put on hold and this issue was unassigned to me bc I started it prematurely before we could focus on performance/metrics. It needed further development/testing to get working.
@wileyj Should I assign this issue to you? I'm not exactly sure where it stands in relation to the changes made by @fbwoolf with #491
the comment I made was informational only based on something we learned from the stacks-blockchain-api
@fbwoolf do you have enough context on this issue already to t-shirt it? I presume it's "1+ weeks dev" as it currently stands?
@wileyj could you specify precisely what data is currently not available for the Explorer via Prometheus given where things stand today (i.e. just what data this issue would give us once executed)?
@fbwoolf do you have enough context on this issue already to t-shirt it? I presume it's "1+ weeks dev" as it currently stands?
I do think 1 week
makes sense just bc there is some research involved in understanding what we need to do for this and how to test it. Where are the new labels? Am I supposed to apply it here?
Didn't mean to close this!
@MrHeidar Any chance we can revitalize this issue? It would be instrumental in learning more about how the Explorer functions internally.
We can review for prioritization again. @CharlieC3 any additional context re: the types of data points this will give us in particular (per https://github.com/hirosystems/explorer/issues/329#issuecomment-968726346)?
@markmhx I don't believe any custom data points were added in Explorer code, so it should at least provide the promster defaults if not more: https://github.com/tdeekens/promster#garbage-collection https://github.com/tdeekens/promster#http-timings-hapi-express-marblejs-and-fastify
I'll close this issue for now
Describe the bug Port 9153 to emit metrics is no longer open
Expected behavior port 9153 is open, and the path
/metics
reports TSDB data