Closed xp-1000 closed 3 years ago
Hi @xp-1000, as stated in https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/kubeletstatsreceiver#metric-groups , volume
metrics are not collected by default by the kubeletstats
receiver (only container
, pod
and node
metrics are collected). You'll need to explicitly override the metric_groups
option with
metric_groups:
- node
- pod
- container
- volume
Can you try making the above change to your kubeletstats receiver config?
If you tried the https://github.com/signalfx/splunk-otel-collector-chart, I believe we currently do not have an option to turn these metrics on in the chart. I've created https://github.com/signalfx/splunk-otel-collector-chart/issues/111 for this.
In terms of parity, the k8scluster
receiver collects the metrics collected by the kubernetes-cluster
monitor and the kubeletstats
receiver collects metrics collected by the kubelet-metrics
and kubernetes-volumes
monitors.
Hello @asuresh4 thanks for your answer.
adding volume
to metric_groups
is the first thing I tried, sorry I forgot to speak about that.
oh my bad, just checked again into signalfx metric explorer and indeed some metrics are present on my otel test env, they just changed their name :
kubernetes.volume_available_bytes
-> k8s.volume.available
kubernetes.volume_capacity_bytes
-> k8s.volume.capacity
But I still don't see the inode
related metrics (with the new or old name) when I use Otel collector even if it seems supported: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/kubeletstatsreceiver/kubelet/volume.go#L29 (I don't see any errors on the logs but I confess I don't know how to troubleshoot metrics gathering as it was possible with signalfx-agent tap-dps
for example).
In any case I am still worry about the fact we cannot configure authType
on kubeletAPI
, this make the usage of smart agent monitors impossible and I am pretty sure we will meet different cases / metrics without equal parity between smartagent and otel collector.
For example, after volumes which is critical I would like to configure kubernetes-event
which will have the same problem (and this time there is no equivalent in otel collector).
@xp-1000 - you should be able to use the old or new names interchangeably, i.e., even though the collector sends new metrics, you should be able to search for old metrics in the UI. If this is not the case please open a support ticket with information such as realm/org and we will be able to help address that.
By default the OTel collector only emits metrics that are classified as default by the respective Smart Agent monitor. For monitors that have already been ported to OpenTelemetry such as hostmetrics
, k8scluster
, kubeletstats
receivers, this is currently controlled by the signalfx
exporter. To include metrics that are non-default you can make use of the include_metrics
option on the exporter.
include_metrics:
- metric_names: [k8s.volume.inodes, k8s.volume.inodes.free, k8s.volume.inodes.used]
I will get back to you on configuring authType
using the smartagent
receiver after some more investigation.
Hello @asuresh4 thank you very much for your help.
I am sorry I must have missed this information about extraMetrics (but this was one of my question) I will test now and come back to confirm but it makes sens!
if a newbie as me read this and try to replace smart agent by otel collector I think the documentation available on https://docs.splunk.com/Observability/get-started/migrate/migrate-to-otel.html#nav-Replace-the-SignalFx-Smart-Agent-with-the-Splunk-Distribution-of-OpenTelemetry-Collector is maybe not enough (at least for me) so I recommend to check the following good docs on github:
The last one contains everything I missed until now (thanks @asuresh4).
OK so I confirm it works fine with the following configuration:
config:
receivers:
receiver_creator:
receivers:
smartagent/nginx:
rule: type == "port" && pod.name matches "nginx" && port == 80
config:
type: collectd/nginx
kubeletstats:
metric_groups:
- node
- pod
- volume
exporters:
signalfx:
include_metrics:
- metric_names:
- k8s.job.desired_successful_pods # kubernetes.job.completions
- k8s.job.active_pods # kubernetes.job.active
- k8s.job.successful_pods # kubernetes.job.succeeded
- k8s.statefulset.ready_pods # kubernetes.stateful_set.ready
- k8s.statefulset.desired_pods # kubernetes.stateful_set.desired
- k8s.hpa.max_replicas # kubernetes.hpa.spec.max_replicas
- k8s.hpa.desired_replicas # kubernetes.hpa.status.desired_replicas
- k8s.volume.inodes.free # kubernetes.volume_inodes_free
- k8s.volume.inodes # kubernetes.volume_inodes
service:
pipelines:
metrics:
receivers:
- kubeletstats
- receiver_creator
extensions:
zpages:
endpoint: 0.0.0.0:55679
I see inode related metrics in the metric finder using the new opentelemetry naming. The finder cannot find the metric with the old signalfx naming (this is why I did not find them in first place) BUT you are right @asuresh4 the old signalfx metric name still work if you set it manually in signalflow (it is just not listed the metric finder).
Now thanks to your help, I think I have only one remaining problem about dimensions parity:
as you can see, some kubernetes metrics have the "old" signalfx dimension kubernetes_cluster
in addition to the new one k8s.cluster.name
but this is not case of the volume metrics for example.
this is a problem for us because we maintain a list of terraform modules for signalfx "template" detectors and we rely on metrics names but also on their dimensions. While the metrics names are translated we can hope to keep these base of detectors working with both smart agent and otel collector to avoid a disruptive migration but we also need the dimension parity.
should I try to use dimensionClients
to keep old dimensions ? I confess it is not fully clear for me this part.
thanks !
Ok I added :
- action: rename_dimension_keys
mapping:
k8s.cluster.name: kubernetes_cluster
in the signalfx export configuration and now I get my metrics with the "old" dimension kubernetes_cluster
.
With this setup all of our detectors will work for both smart agent AND otel collector without any change to them (e.g. for kubernetes: https://github.com/claranet/terraform-signalfx-detectors/tree/master/modules/smart-agent_kubernetes-common).
I think the other modules will be easier while we use the original smart agent monitors (and not a new otel exporter like for hostmetrics or kubernetes).
Now I will try to make the kubernetes-events
works but I am afraid it will have the same problem than kubernetes_volume
on the kubeapi configuration (that said this less critical given that this is not a dependency for detectors/alerts).
@xp-1000 - I did a quick test and it appears, both the new and old dimensions are available. However, it does not surface in the suggestions like the issue you saw with metrics in metric finder (renamed metrics not showing up), but you should be able to provide the filter using signalflow. You shouldn't need the additional dimension remapping on the collector. On checking with the relevant team, I've learnt that this is a known limitation. I would recommend you also open a support ticket highlighting you've run into this.
I've opened #345, which I believe will fix the issue you're seeing with the kubeletAPI
config.
@xp-1000 thank you for the feedback on the gaps in docs for migration. I realize that we don't have information around monitors that have already been ported over to OTel (such as hostmetrics, k8scluster, kubeletstats) for advanced configurations that give parity with the respective agent monitors. I'll work on adding some of this information. Please also let us know of other difficulties you've run into while trying to replace the Smart Agent. cc @rmfitzpatrick
I did a quick test and it appears, both the new and old dimensions are available. However, it does not surface in the suggestions like the issue you saw with metrics in metric finder
Oh indeed, again I am bad ^^ it seems applying my "transformation rule" fix the suggestion filter but in true the dimension is here you are right.
I've opened #345, which I believe will fix the issue you're seeing with the kubeletAPI config.
awesome thanks I will test (it will probable unstuck me for the kubernetes-events
monitor
Please also let us know of other difficulties you've run into while trying to replace the Smart Agent. it will be a pleasure, I am afraid I will not be able to make pull request contributions like I did on smart agent until I take the hand on this otel collector which remains pretty new for me but I can at least report issues ;)
Your precious help is highly appreciated thanks ! I think you can close this issue.
Hello,
I am trying to replace signalfx smart agent by otel collector since its deprecation.
I started with a simple configuration for host, kubernetes and nginx metrics.
It is not clear if native
kubeletstatsreceiver
andk8sclusterreceiver
from otel collector fully replaces all thekube*
smart agent monitors (e.g.kubernetes-events
orkubernetes-volumes
..).For volumes, according to the documentation: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/kubeletstatsreceiver#metric-groups it supports volumes metrics.
I see most of the metrics from
kubernetes-cluster
andkubelet-metrics
coming to Splunk Observability expect for volume metrics: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/kubeletstatsreceiver/kubelet/volume.go#L27. However, as you can see in the configuration,metric_groups
is not defined so it should fetch all metrics including volumes.Not a big deal given that this receiver is still in beta and the goal of https://github.com/signalfx/splunk-otel-collector/tree/main/internal/receiver/smartagentreceiver is also to be able to keep existing working smart agent monitors waiting for full replacement by opentelemetry collector. So I tried to configure the original https://docs.signalfx.com/en/latest/integrations/agent/monitors/kubernetes-volumes.html monitor.
Sadly, with the previously shared configuration I got the following error:
So I updated to kubernetes volumes configuration fragment to:
But now I get following errors:
According to https://docs.signalfx.com/en/latest/integrations/agent/monitors/kubernetes-volumes.html#configuration
authType
is a valid configuration forkubeletAPI
nested block so my configuration seems correct.So maybe it is an "exception" of monitor not compatible with smart agent receiver?
thanks for your help