oracle / oci-grafana-metrics

Grafana datasource plugin for OCI metrics
https://grafana.com/grafana/plugins/oci-metrics-datasource
Universal Permissive License v1.0
79 stars 40 forks source link

TestConnectivity fails with "Expensive list request" error #291

Open sumangecu opened 2 months ago

sumangecu commented 2 months ago

I am using oci-grafana-metrics plugin "5.5.1". Recently, the plugin started giving below error during TestConnectivity using InstancePrincipal.

logger=plugin.oci-metrics-datasource t=2024-09-03T10:29:14.430391255Z level=error msg=client GetSubscribedRegionstakey="fetching the subscribed region for tenancy OCID: ocid1.tenancy.oc1..aaaaaaaaicizkujlsheu5bvqalzlzelugpn67vcrmvdg62c57awrl3c4o3zq" logger=plugin.oci-metrics-datasource t=2024-09-03T10:29:14.542073759Z level=error msg=client GetSubscribedRegionstakey="fetching the subscribed region for regioname: ap-mumbai-1" logger=plugin.oci-metrics-datasource t=2024-09-03T10:29:14.542142469Z level=error msg=client GetSubscribedRegionstakey="fetching the subscribed region for regioname: eu-paris-1" logger=plugin.oci-metrics-datasource t=2024-09-03T10:29:14.542189028Z level=error msg=client GetSubscribedRegionstakey="fetching the subscribed region for regioname: sa-saopaulo-1" logger=plugin.oci-metrics-datasource t=2024-09-03T10:29:14.54221607Z level=error msg=client GetSubscribedRegionstakey="fetching the subscribed region for regioname: us-ashburn-1" logger=plugin.oci-metrics-datasource t=2024-09-03T10:29:14.542234666Z level=error msg=client GetSubscribedRegionstakey="fetching the subscribed region for regioname: us-phoenix-1" logger=plugin.oci-metrics-datasource t=2024-09-03T10:29:21.569159461Z level=error msg=client GetCompartments="fetching the sub-compartments for tenancy: " logger=plugin.oci-metrics-datasource t=2024-09-03T10:29:21.569224405Z level=error msg=GetTenancyAccessKey Validtakey=DEFAULT/ logger=plugin.oci-metrics-datasource t=2024-09-03T10:29:21.569247169Z level=warn msg=client GetCompartments="getting the data from cache" logger=plugin.oci-metrics-datasource t=2024-09-03T10:29:49.068656168Z level=error msg=client GetNamespaceWithMetricNames="fetching the metric names along with namespaces under compartment: ocid1.compartment.oc1..aaaaaaaajflypppm2hceucglyjkepcdva7f7ilw5agiuxaccgjxmhxfemefa" logger=plugin.oci-metrics-datasource t=2024-09-03T10:29:49.06875621Z level=error msg=GetTenancyAccessKey Validtakey=DEFAULT/ logger=plugin.oci-metrics-datasource t=2024-09-03T10:29:49.068790695Z level=error msg=client.utils listMetricsMetadataPerRegion="Data fetch start by calling list metrics API for a particular regions" logger=plugin.oci-metrics-datasource t=2024-09-03T10:29:49.135137756Z level=error msg=client.utils listMetrics="Error returned by Monitoring Service. Http Status Code: 400. Error Code: InvalidParameter. Opc request id: 09a45932b7aca52f43ef3093a77e021d/4B2F82940F780AA1CE179E8300BAA20B/D522D9FE518E30C6B4F5D17DEF8F2EE4. Message: Expensive list request. Please narrow down your search parameters.\nOperation Name: ListMetrics\nTimestamp: 2024-09-03 10:29:49 +0000 GMT\nClient Version: Oracle-GoSDK/65.60.0\nRequest Endpoint: POST https://telemetry.us-phoenix-1.oraclecloud.com/20180401/metrics/actions/listMetrics?compartmentId=ocid1.compartment.oc1..aaaaaaaajflypppm2hceucglyjkepcdva7f7ilw5agiuxaccgjxmhxfemefa&compartmentIdInSubtree=false\nTroubleshooting Tips: See https://docs.oracle.com/iaas/Content/API/References/apierrors.htm#apierrors_400__400_invalidparameter for more information about resolving this error.\nAlso see https://docs.oracle.com/iaas/api/#/en/monitoring/20180401/Metric/ListMetrics for details on this operation's requirements.\nTo get more info on the failing request, you can set OCI_GO_SDK_DEBUG env var to info or higher level to log the request/response details.\nIf you are unable to resolve this Monitoring issue, please contact Oracle support and provide them this full error message."

The issue seems due to change in OCI monitoring service.

mamorett commented 2 months ago

Hi. That's interesting, I never saw the "Expensive list request" error in listMetrics. This is most probably due to the fact you have so many metrics in the tenancy which breaks some limits. I need to check if I can intercept this kind of situations in the code. For now, you can ignore the message, I will try to figure out if I can do something in the code for the next release.

sumangecu commented 2 months ago

Th interesting part is:

While adding OCI data-source using instance principal, the listMetrics call is made (as part of TestConnectivity). It results in the above error.

Due to this, i am unable to use the OCI data-source.

This flow has been working while couple of weeks ago for the same tenancy (the list of metrics are also same) and it started giving error now.

mamorett commented 2 months ago

@sumangecu : could it be that during these weeks some additional metric namespaces were added in your tenancy (maybe by UMA) ?

mamorett commented 2 months ago

btw, in the next release I will add this one: Limit: common.Int(25), in the listMetrics for the test functions. I hope this will restrict the list of metrics and solve your issue too.