Open jjo opened 3 months ago
I'm hesitant about replacing the instance
label with node
, as this couples our definition of "machine" with "machine provisioned by kubernetes". Since we want to make this more agnostic of how the machine was provisioned, I think that node
wouldn't be the right choice here.
That being said, this does make sense as an optional config argument, as it can then be used for convenience for a big (and possibly majority) use-case: managed Kubernetes.
I think this brings up an interesting point, since the instance
label will often be replaced by Prometheus Scrapers and rewritten as exported_instance
(as the instance of the job running cloudcost-exporter is already using the instance
label). Thoughts on:
instance
label to something agnostic of how it was provisioned, like machine_name
?I'm hesitant about replacing the
instance
label withnode
, as this couples our definition of "machine" with "machine provisioned by kubernetes". Since we want to make this more agnostic of how the machine was provisioned, I think thatnode
wouldn't be the right choice here.That being said, this does make sense as an optional config argument, as it can then be used for convenience for a big (and possibly majority) use-case: managed Kubernetes.
I think this brings up an interesting point, since the
instance
label will often be replaced by Prometheus Scrapers and rewritten asexported_instance
(as the instance of the job running cloudcost-exporter is already using theinstance
label). Thoughts on:
- config value for the label meaning "name of machine"?
- renaming the
instance
label to something agnostic of how it was provisioned, likemachine_name
?
+1 moving away from instance
, I like machine_name
as the default value for such e.g. --compute.vm-instance-label=LABEL_NAME
(or alike, naming-is-hard always hunting us).
I am opposed to having labels defined via command line arguments for the following reasons:
I'd really like for us to get alignment on what the root problem is that we're trying to address(In this case, instance
doesn't appear to be the appropriate label, it should be node), and then agree on a solution. Whichever path we take is going to require us to rewrite our internal PromQL queries and do a migration of the metrics.
I am opposed to having labels defined via command line arguments for the following reasons:
- It implies we got the names of labels incorrect and we're masking that to ease PromQL creation
It's not only easying promQL, it's actually decreasing the load into TSDB, for an arbitrary renaming that will likely be always needed (should we keep instance
as its name).
- Potential of running many collectors with different labels at the same time which means we can't guarantee the use of a metric with a specific query
(?) sure, as operator you could always shoot yourself in the foot, nevertheless top->down label setting currently only allows one entry point for it.
- Exposing us to having a configuration for every label and having a massive config struct to run the application
Didn't ever propose that -- it's this particular one I'm concerned, as we do know [*] we have disjoint sets of metrics (ec2, kube) that inevitably use either instance
(sorry for that one, as it'll always be exported_instance
unless changed at scrape time), xor node
.
I'd really like for us to get alignment on what the root problem is that we're trying to address(In this case,
instance
doesn't appear to be the appropriate label, it should be node), and then agree on a solution. Whichever path we take is going to require us to rewrite our internal PromQL queries and do a migration of the metrics.
Already replied in the very previous paragraph[*]
I am opposed to having labels defined via command line arguments for the following reasons:
- It implies we got the names of labels incorrect and we're masking that to ease PromQL creation
It's not only easying promQL, it's actually decreasing the load into TSDB, for an arbitrary renaming that will likely be always needed (should we keep
instance
as its name).
- Potential of running many collectors with different labels at the same time which means we can't guarantee the use of a metric with a specific query
(?) sure, as operator you could always shoot yourself in the foot, nevertheless top->down label setting currently only allows one entry point for it.
- Exposing us to having a configuration for every label and having a massive config struct to run the application
Didn't ever propose that -- it's this particular one I'm concerned, as we do know [*] we have disjoint sets of metrics (ec2, kube) that inevitably use either
instance
(sorry for that one, as it'll always beexported_instance
unless changed at scrape time), xornode
.I'd really like for us to get alignment on what the root problem is that we're trying to address(In this case,
instance
doesn't appear to be the appropriate label, it should be node), and then agree on a solution. Whichever path we take is going to require us to rewrite our internal PromQL queries and do a migration of the metrics.Already replied in the very previous paragraph[*]
Indeed, one aspect that I'll really like cloudcost-exporter
to shine when compared with other existing solutions, is being as "friendly" as possible to the TSDB, also requiring a lesser amount of recording-rules / scraping artifacts (relabeling and building new derived metrics), for obvious reasons.
What
Either change compute modules'
instance
tonode
, or support e.g.--compute.vm-instance-label=LABEL_NAME
.Why
Couple main reasons:
1) When using
cloudcost-exporter
metrics in Kubernetes, to be able to join againstkube_.*
metrics (likekube_node_labels
,kube_node_annotations
,kube_node_status_capacity
,kube_node_status_allocatable
, ... see full list below), you then are forced to relabelinstance
tonode
with a construct likeThe above needs to be done for every
<CSP>
you may run (especially the case if you have a centralized TSDB), note also that you need to visit every timeseries to perform this relabelling (under the context of TSDB pressure).2) Wisely,
cloudcost-exporter
already chosecluster_name
instead ofcluster
(this being a popular _labelname choice, then avoiding the scrape-time rename toexported_cluster
), then why keepinginstance
that by design scraping agents will always rename toexported_instance
(thus the example relabelling query above).Appendix
Pasting the full list of Kubernetes metrics using
node
, from this querycount by (__name__)({__name__=~"kube_.+", node!=""})