Currently, I check for the status of my subgraph nodes with the following prometheus query:
(ethereum_chain_head_number{} - ignoring(deployment, job, network, shard) deployment_head{} > 100) and ignoring(deployment, job, network, shard)(deriv(deployment_head{}[10m]) * 60 <= 0)
In simple terms, the query checks for the differences between the ethereum_chain_head_number metric and the deployment_head metric. If it is more than 100 blocks AND there hasn't been any significant changes in the deployment_head value over a 10 minutes window, the node is flagged as out of sync. The two metrics do not have the same labels, hence the ignoring of labels as seen in the query.
This query works fine for nodes hosting one or more subgraphs of different networks. However, when a subgraph of the same network but different subgraphName is deployed to a node, the only way to differentiate them is through the deployment label, which is a hash number that changes for any update and therefore, complex to manage. This introduces an issue as deployment_head starts coming off as a duplicate metric.
sample:
Assuming the two metrics below are for two subgraphs (sepolia-1 and sepolia-2), I can't tell them apart, and as far as I don't filter by deployment, it is a duplicate metric when evaluating it.
deployment_head{deployment=<HASH IPFS deployment number 1>, instance=<node_url>, job="mymetrics", network="sepolia", shard="primary"}
deployment_head{deployment=<HASH IPFS deployment number 2>, instance=<node_url>, job="mymetrics", network="sepolia", shard="primary"}
Proposal:
Add subgraphName and subgraphVersion to the BlockStreamMetrics struct. subgraphName being top priority.
Description
Currently, I check for the status of my subgraph nodes with the following prometheus query:
(ethereum_chain_head_number{} - ignoring(deployment, job, network, shard) deployment_head{} > 100) and ignoring(deployment, job, network, shard)(deriv(deployment_head{}[10m]) * 60 <= 0)
In simple terms, the query checks for the differences between the
ethereum_chain_head_number
metric and thedeployment_head
metric. If it is more than 100 blocksAND
there hasn't been any significant changes in thedeployment_head
value over a 10 minutes window, the node is flagged as out of sync. The two metrics do not have the same labels, hence theignoring
of labels as seen in the query.This query works fine for nodes hosting one or more subgraphs of different networks. However, when a subgraph of the same network but different
subgraphName
is deployed to a node, the only way to differentiate them is through thedeployment
label, which is a hash number that changes for any update and therefore, complex to manage. This introduces an issue asdeployment_head
starts coming off as a duplicate metric.sample:
Assuming the two metrics below are for two subgraphs (sepolia-1 and sepolia-2), I can't tell them apart, and as far as I don't filter by deployment, it is a duplicate metric when evaluating it.
Proposal:
Add
subgraphName
andsubgraphVersion
to theBlockStreamMetrics
struct.subgraphName
being top priority.Are you aware of any blockers that must be resolved before implementing this feature? If so, which? Link to any relevant GitHub issues.
No response
Some information to help us out