splunk / splunk-add-on-microsoft-azure

Splunk Add-on for Microsoft Azure
Apache License 2.0
11 stars 7 forks source link

Why did the connector suddenly stop retrieving metrics through Azure Metrics for "microsoft.servicebus/namespaces" namespace? #47

Open AndrewTrobec opened 1 year ago

AndrewTrobec commented 1 year ago

Hello,

I was successfully retrieving metrics through Azure Metrics for the "microsoft.servicebus/namespaces", in particular two metrics "NamespaceCpuUsage" and "NamespaceMemoryUsage". Set up went without issue last week. This week we noticed that on Sunday the connector changed the list of metrics that it was retrieving for the 10 servicebus resources. It now retrieves everything except "NamespaceCpuUsage" and "NamespaceMemoryUsage" which happen to be the only two metrics I want.

Here are two screenshots that show the number of metrics that are retrieved over time for one servicebus resource and the corresponding log data from the connector where the two metrics that I need are no longer visible.

image

image

When I make the same call with postman I can get the metrics successfully, so based on this I am excluding an issue with Azure configuration for now:

image

Any help is appreciated.

Regards,

Andrew

JasonConger commented 1 year ago

It looks like Microsoft may have removed those metric definitions.

The add-on caches metric definitions every 30 days (reference). It does this by calling the Metrics Definitions REST API from Microsoft for the resource. Using the "Try It" button on the documentation page (see screenshot below), it looks like NamespaceCpuUsage and NamespaceMemoryUsage are no longer returned as valid metrics. However, NamespaceCpuUsage and NamespaceMemoryUsage are shown on the list of supported metrics, so I'm not sure why those metric definitions are not being returned by the Metrics Definitions REST API.

Screenshot 2023-05-19 at 2 37 51 PM

I also do not see those metric definitions in the Azure Portal.

Screenshot 2023-05-19 at 5 14 04 PM

Recommendation: open a ticket with Microsoft to determine why NamespaceCpuUsage and NamespaceMemoryUsage are not returned from the Metrics Definitions REST API.

Additional information: the list of cached metrics can be seen on the Splunk side by navigating to "Troubleshooting" > "View Checkpoints" from the add-on. The cached metrics for Service Bus will have a key of microsoft.servicebus_namespaces.

Temporary workaround: the list of cached metrics can be updated with SPL by using inputlookup and outputlookup. Email me (jconger@splunk.com) if you want an example of the SPL. I'm hesitant to share it here as you can really mess up your other checkpoints if done incorrectly.

AndrewTrobec commented 1 year ago

Thanks @JasonConger!

I am following up on your notes and suggestions, very much appreciate your support. Will let you know.

I know that NamespaceCpuUsage and NamespaceMemoryUsage are premium metrics. Could this have anything to do with it?

Regards,

Andrew

JasonConger commented 1 year ago

@AndrewTrobec - thanks for pointing out the premium note. After testing with a premium Service Bus, I do see NamespaceCpuUsage and NamespaceMemoryUsage metric definitions. Are you seeing something different with a premium Service Bus?

AndrewTrobec commented 1 year ago

Ciao @JasonConger good morning!

To answer your question, with Postman I can see the premium metrics, but with Splunk I cannot retrieve them anymore because for some reason they stopped on April 23rd at around 9AM UTC.

If I try to call an invalid metric via Postman then it fails and returns a list of valid metrics amongst which NamespaceCpuUsage and NamespaceMemoryUsage are present:

image

Please let me know your thoughts.

Thank you and best regards,

Andrew

edit: Confirm that servicebus is premium tier:

image