onflow / flow-evm-gateway

FlowEVM Gateway implements an Ethereum-equivalent JSON-RPC API for EVM clients to use
https://developers.flow.com/evm/about
Apache License 2.0
9 stars 10 forks source link

Implementing metrics #125

Closed sideninja closed 1 week ago

sideninja commented 6 months ago

We need comprehensive metrics to measure the performance and resource usage of our APIs. This will help us understand the performance of different API methods and track various states and errors.

### Performance
- [x] Measure end-to-end request/response time by method call (track the time taken from the start of a request to the return of the response for each method call to understand relative performance and user experience using percentiles).
- [ ] Monitor time it takes for Flow transaction to be submitted and a result is returned, with the transaction status
- [x] API requests per time interval metric
- [x] API calls by API endpoint (most used to least used calls)

Measuring performance can/should be done using tracing, so we can have multiple sub-calls measured as well. Ideally, we should have all the network calls as a sub-trace as well as any APIs. Traces should be enabled with a flag and not on by default. Each API response time should also submit a simple metric measuring the time it took for the request to be processed.

Be careful to also include websocket request/responses metrics.

## State
- [x] Ingestion index health is a boolean value that is being set to false if the latest indexed EVM height falls behind the latest EVM height by X
- [x] Execution EVM traces index health is a boolean value that should be set to false if there are any traces that failed to download
- [x] API errors should be submitted to a counter metric
- [ ] Report fees paid on Flow and EVM side as a metric
- [x] Metric for users EVM contract addresses which are being called
- [x] Database size (folder size)
## Ingestion
- [x] EVM height should be submitted as a value on event ingestion
- [x] Trace download failures should be recorded

We should use prometheus and open telemetry to collect the traces and metrics.

m-Peter commented 5 months ago

For JSON-RPC endpoints that are served over WebSocket, such as subscriptions and filtering of entities, we should add some dedicated metrics as well, e.g. active connections etc.

franklywatson commented 5 months ago

@m-Peter also suggested tracking DB size over time and also DB query time

sideninja commented 2 months ago

Add metrics for index health. Trace index health is dependent on the trace download success, if one is failed the index becomes unhealthy. Transaction index health is dependent on how far back the latest ingested event is from the latest height on the network. If too far behind the index is unhealthy.

sideninja commented 1 month ago

Another high priority metric is: https://github.com/onflow/flow-evm-gateway/issues/384

j1010001 commented 1 week ago

First set of metrics is implemented and Grafana Dashboard created: https://flowfoundation.grafana.net/d/PkvVJj4Mz/mainnet-general?from=now-24h&to=now&timezone=America%2FVancouver