Emit metrics for Prometheus

wileyj commented 3 years ago

Describe the bug Port 9153 to emit metrics is no longer open

Expected behavior port 9153 is open, and the path /metics reports TSDB data

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

wileyj commented 3 years ago

bump

wileyj commented 3 years ago

history: because of how the explorer works, it can potentially track millions of paths for latency and other metrics. since we're using a pull based metrics model, every n seconds when we pull the metrics from the explorer the data we're pulling can be an extremely large dataset. what was happening was when metrics were enabled for the explorer, every n seconds the explorer latency would spike while the metrics were pulled. Not to mention the latency measurements are essentially unusable since each path is different.

to resolve this for the API, @zone117x did some work to store similar paths into dedicated buckets. for example, all /block/xxxx requests would be stored in a blocks bucket, etc.

I'll look for the PR in that repo that enabled this and link here shortly.

wileyj commented 3 years ago

https://github.com/blockstack/stacks-blockchain-api/pull/502

https://github.com/blockstack/stacks-blockchain-api/blob/master/src/api/init.ts#L196-L226

markmhendrickson commented 3 years ago

Thanks @wileyj 🙏 Assigning to @aulneau so he can provide some input as well for anyone who might tackle this issue.

fbwoolf commented 3 years ago

I can start looking at this if @aulneau can provide a bit more context?

aulneau commented 3 years ago

Sure -- so we need to do a few things.

We are no longer using the custom server found here: https://github.com/blockstack/explorer/blob/main/server/index.js

We need to start using it again, and it needs to be updated in a few ways:

remove the SSR caching that is included in it
implement the same kind of middleware as in the api, see that here and the one we have in the server for the explorer
implement a path regex like in the api seen here
make sure to read this and review this

Basically these changes make it so that all types of paths put their metrics into a single respective bucket, meaning rather than having a single metrics entry per address that is navigated to, we have all metrics combined into a single address bucket.

@zone117x might be able to provide further guidance

zone117x commented 3 years ago

The API PR first implementing the route condensing is actually https://github.com/blockstack/stacks-blockchain-api/pull/412

wileyj commented 2 years ago

with the promster library - they have some defaults set that are capping our data. https://github.com/tdeekens/promster

for the sample metric, http_request_duration_seconds_bucket they clearly state this in the readme:

http_request_duration_seconds: a Prometheus histogram with request time buckets in milliseconds (defaults to [ 0.05, 0.1, 0.3, 0.5, 0.8, 1, 1.5, 2, 3, 5, 10 ])

which corresponds with the data we've seen. It appears it's possible to modify this, but would require a code change (in the API and any other repo where these are not the defaults we want): https://github.com/tdeekens/promster/blob/main/packages/metrics/src/create-metric-types/create-metric-types.ts#L173-L184

for the stacks-blockchain-api repo, this was addressed in this commit: https://github.com/blockstack/stacks-blockchain-api/commit/735874e45c1e198724e7d01ca9e4eec4d108706c

markmhendrickson commented 2 years ago

@wileyj Should I assign this issue to you? I'm not exactly sure where it stands in relation to the changes made by @fbwoolf with https://github.com/blockstack/explorer/pull/491

fbwoolf commented 2 years ago

@wileyj Should I assign this issue to you? I'm not exactly sure where it stands in relation to the changes made by @fbwoolf with #491

That PR was put on hold and this issue was unassigned to me bc I started it prematurely before we could focus on performance/metrics. It needed further development/testing to get working.

wileyj commented 2 years ago

@wileyj Should I assign this issue to you? I'm not exactly sure where it stands in relation to the changes made by @fbwoolf with #491

the comment I made was informational only based on something we learned from the stacks-blockchain-api

markmhendrickson commented 2 years ago

@fbwoolf do you have enough context on this issue already to t-shirt it? I presume it's "1+ weeks dev" as it currently stands?

@wileyj could you specify precisely what data is currently not available for the Explorer via Prometheus given where things stand today (i.e. just what data this issue would give us once executed)?

fbwoolf commented 2 years ago

@fbwoolf do you have enough context on this issue already to t-shirt it? I presume it's "1+ weeks dev" as it currently stands?

I do think 1 week makes sense just bc there is some research involved in understanding what we need to do for this and how to test it. Where are the new labels? Am I supposed to apply it here?

CharlieC3 commented 2 years ago

Didn't mean to close this!

CharlieC3 commented 2 years ago

@MrHeidar Any chance we can revitalize this issue? It would be instrumental in learning more about how the Explorer functions internally.

markmhendrickson commented 2 years ago

We can review for prioritization again. @CharlieC3 any additional context re: the types of data points this will give us in particular (per https://github.com/hirosystems/explorer/issues/329#issuecomment-968726346)?

CharlieC3 commented 2 years ago

@markmhx I don't believe any custom data points were added in Explorer code, so it should at least provide the promster defaults if not more: https://github.com/tdeekens/promster#garbage-collection https://github.com/tdeekens/promster#http-timings-hapi-express-marblejs-and-fastify

andresgalante commented 1 year ago

I'll close this issue for now

hirosystems / explorer

Emit metrics for Prometheus #329