Open volovyks opened 1 week ago
PS, it is a modified Dashboard, I will add it soon.
So I did notice this started when we moved our node over, I'm not sure if the fact that our node is technically running on a shorter timeframe than the others since we destroyed our node and rebuilt it. I attributed it to that, so perhaps it is the way the metric is exported.
Just for clarity sake, this node is the exact same machine size, disk size, and networking configuration as the rest of the partner nodes. I mirrored the environment from Pagoda 1 for 1 just to avoid any issues.
Here's my theory:
This line of code controls the increment of that metric count
crate::metrics::PROTOCOL_ITER_CNT
.with_label_values(&[my_account_id.as_str()])
.inc();
I hypothesize that grafana calculates the rate per hour (increase()
) by dividing the total count by 60 mins. So since our node is "newer" than the other nodes, there will be significant difference between the total number of iterations from all other nodes to this node. There are months of iterations on the other nodes, and we only have about 27 days worth of iterations.
That is also the reason the other nodes are not exactly aligned with each other, since it took about a week for all of our partners to update.
Let's see how it will behave after the release. I hope increase
means how many new iterations happened in the last hour.
That is what the docs says it means, so maybe we do have an issue. I am not sure what that may be though.
https://prometheus.io/docs/prometheus/latest/querying/functions/#increase
Description
Such behavior was explored on Testnet and Mainnet. It can lead to failures in all protocols.