tomkerkhove / promitor

Bringing Azure Monitor metrics where you need them.
https://promitor.io
MIT License
249 stars 91 forks source link

Provide support for scraping Azure Database for PostgreSQL Flexible server & Hyperscale #1870

Closed LukaszSzalek-TomTom closed 2 years ago

LukaszSzalek-TomTom commented 2 years ago

Proposal

I would like to be able to select multiple Flex servers for monitoring either by using regex in URI or in resourceDiscoveryGroup Example resources:

Component

Resource Discovery, Scraper

Contact Details

lukasz.szalek@tomtom.com

tomkerkhove commented 2 years ago

Adding this to the next shipping cycle but you can currently already achieve this with our generic scraper - https://docs.promitor.io/configuration/v2.x/metrics/generic-azure-resource

What you have as an example should actually work (without discovery though): resourceUri: Microsoft.DBForPostgreSql/flexibleServers/{.myproject.}

tomkerkhove commented 2 years ago

Let's introduce a server type as per https://docs.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-compare-single-server-flexible-server

This should be something like:

name: azure_postgre_sql_cpu_percent
description: "The CPU percentage on the server"
resourceType: PostgreSql
scraping:
  schedule: "0 */2 * ? * *"
azureMetricConfiguration:
  metricName: cpu_percent
  aggregation:
    type: Average
    interval: 00:01:00
resources: # Optional, required when no resource discovery is configured
- serverName: Promitor-1
  type: Flexible # Options: Single (default), Flexible
LukaszSzalek-TomTom commented 2 years ago

Can i use regex here ? resources: # Optional, required when no resource discovery is configured

LukaszSzalek-TomTom commented 2 years ago

Hi Setting this resourceType: PostgreSql makes promitor to stop working

[11:27:07 INF] Booting up Promitor v2.5.0 - Thank you for using Promitor! [11:27:07 INF] Using configuration folder '/config/' [11:27:23 FTL] Promitor Scraper Agent has encountered an unexpected error. Please open an issue at https://github.com/tomkerkhove/promitor/issues to let us know about it. System.ArgumentNullException: [scraping] cannot be Null. (Parameter 'scraping') at GuardNet.Guard.For[TException](Func1 predicate, TException exception) at GuardNet.Guard.NotNull[TParam,TException](TParam param, TException exception) at GuardNet.Guard.NotNull[TParam](TParam param, String paramName, String message) at GuardNet.Guard.NotNull[TParam](TParam param, String paramName) at Promitor.Core.Scraping.Configuration.Model.Metrics.ScrapeDefinition1..ctor(AzureMetricConfiguration azureMetricConfiguration, PrometheusMetricDefinition prometheusMetricDefinition, Scraping scraping, TResourceDefinition resource, String subscriptionId, String resourceGroupName) in /src/Promitor.Core.Scraping/Model/Metrics/ScrapeDefinition.cs:line 34 at Promitor.Core.Scraping.Configuration.Model.Metrics.MetricDefinition.CreateScrapeDefinition(IAzureResourceDefinition resource, AzureMetadata azureMetadata) in /src/Promitor.Core.Scraping/Model/Metrics/MetricDefinition.cs:line 74 at Microsoft.Extensions.DependencyInjection.SchedulingExtensions.ScheduleResourceScraping(IAzureResourceDefinition resource, AzureMetadata azureMetadata, MetricDefinition metric, AzureMonitorClientFactory azureMonitorClientFactory, MetricSinkWriter metricSinkWriter, IAzureScrapingPrometheusMetricsCollector azureScrapingPrometheusMetricCollector, IConfiguration configuration, IOptions1 azureMonitorLoggingConfiguration, ILoggerFactory loggerFactory, ILogger1 logger, IServiceCollection services) in /src/Promitor.Agents.Scraper/SchedulingExtensions.cs:line 69 at Microsoft.Extensions.DependencyInjection.SchedulingExtensions.ScheduleMetricScraping(IServiceCollection services) in /src/Promitor.Agents.Scraper/SchedulingExtensions.cs:line 34 at Promitor.Agents.Scraper.Startup.ConfigureServices(IServiceCollection services) in /src/Promitor.Agents.Scraper/Startup.cs:line 56 at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at Microsoft.AspNetCore.Hosting.ConfigureServicesBuilder.InvokeCore(Object instance, IServiceCollection services)

tomkerkhove commented 2 years ago

This is not implemented yet and you can only use the Generic scraper for now. See https://docs.promitor.io/configuration/v2.x/metrics/generic-azure-resource

tomkerkhove commented 2 years ago

Can i use regex here ? resources: # Optional, required when no resource discovery is configured

  • serverName: .testproject. type: Flexible # Options: Single (default), Flexible

Where do you want to use regex? We don't support that today.

LukaszSzalek-TomTom commented 2 years ago

regex in server name. Is there any way to choose multiple flexservers without specifying Uri to each one?

tomkerkhove commented 2 years ago

No, not today.

Resource discovery will be your best bet once this feature is implemented.

LukaszSzalek-TomTom commented 2 years ago

Is it possible to add azure tags into scraper?

flexible_server_cpu_average{.....instance_name="Microsoft.DBForPostgreSql/flexibleServers/master-smds-dblab2-dev-99", owner="szalek"}

this owner is the tag: owner on flex server I did

metrics:
- name: flexible_server_cpu_average
  description: "flexible_server_cpu_average"
  resourceType: Generic
  azureTags:
    include:
      - rolegroup
      - owner
  azureMetricConfiguration:
    metricName: cpu_percent
    aggregation:
      type: Average
  resources:
  - resourceUri: Microsoft.DBForPostgreSql/flexibleServers/master-smds-dblab2-dev-99
    resourceGroupName: RG-weu-vnet-tomtom-default

seems no success

tomkerkhove commented 2 years ago

You can find full YAML docs on https://docs.promitor.io/configuration/v2.x/metrics/#metrics but no, Azure tags are not supported but is tracked in https://github.com/tomkerkhove/promitor/issues/599

LukaszSzalek-TomTom commented 2 years ago

Any plans to make tags available? This ticket is still open so I assume is under development, right?

tomkerkhove commented 2 years ago

It is being tracked and will eventually come.

tomkerkhove commented 2 years ago

So I have a WIP PR open which gives the current result:

# HELP promitor_demo_postgresql_discovered Availability (%) of promitor.io measured in Azure Application Insights
# TYPE promitor_demo_postgresql_discovered gauge
promitor_demo_postgresql_discovered{tenant_id="e0372f7f-a362-47fb-9631-74a5c4ba8bbf",subscription_id="63c590b6-4947-4898-92a3-cae91a31b5e4",resource_uri="subscriptions/63c590b6-4947-4898-92a3-cae91a31b5e4/resourceGroups/promitor-testing-infrastructure-us/providers/Microsoft.DBforPostgreSQL/serverGroupsv2/fezfze",resource_group="promitor-testing-infrastructure-us",instance_name="fezfze-Hyperscale",geo="china",environment="dev"} 3 1639034648742
promitor_demo_postgresql_discovered{tenant_id="e0372f7f-a362-47fb-9631-74a5c4ba8bbf",subscription_id="63c590b6-4947-4898-92a3-cae91a31b5e4",resource_uri="subscriptions/63c590b6-4947-4898-92a3-cae91a31b5e4/resourceGroups/promitor-testing-infrastructure-us/providers/Microsoft.DBforPostgreSQL/servers/fezfze3",resource_group="promitor-testing-infrastructure-us",instance_name="fezfze3-Single",geo="china",environment="dev"} 0.215 1639034647817
promitor_demo_postgresql_discovered{tenant_id="e0372f7f-a362-47fb-9631-74a5c4ba8bbf",subscription_id="63c590b6-4947-4898-92a3-cae91a31b5e4",resource_uri="subscriptions/63c590b6-4947-4898-92a3-cae91a31b5e4/resourceGroups/promitor-testing-infrastructure-us/providers/Microsoft.DBforPostgreSQL/flexibleServers/fezfze2",resource_group="promitor-testing-infrastructure-us",instance_name="fezfze2-Flexible",geo="china",environment="dev"} 8.528 1639034648868
# HELP promitor_demo_postgresql_flex Availability (%) of promitor.io measured in Azure Application Insights
# TYPE promitor_demo_postgresql_flex gauge
promitor_demo_postgresql_flex{tenant_id="e0372f7f-a362-47fb-9631-74a5c4ba8bbf",subscription_id="63c590b6-4947-4898-92a3-cae91a31b5e4",resource_uri="subscriptions/63c590b6-4947-4898-92a3-cae91a31b5e4/resourceGroups/promitor-testing-infrastructure-us/providers/Microsoft.DBforPostgreSQL/flexibleServers/fezfze2",resource_group="promitor-testing-infrastructure-us",instance_name="fezfze2-Flexible",geo="china",environment="dev"} 8.528 1639034648903
# HELP promitor_demo_postgresql_simple Availability (%) of promitor.io measured in Azure Application Insights
# TYPE promitor_demo_postgresql_simple gauge
promitor_demo_postgresql_simple{tenant_id="e0372f7f-a362-47fb-9631-74a5c4ba8bbf",subscription_id="63c590b6-4947-4898-92a3-cae91a31b5e4",resource_uri="subscriptions/63c590b6-4947-4898-92a3-cae91a31b5e4/resourceGroups/promitor-testing-infrastructure-us/providers/Microsoft.DBforPostgreSQL/servers/fezfze3",resource_group="promitor-testing-infrastructure-us",instance_name="fezfze3-Single",geo="china",environment="dev"} 0.215 1639034646949

This is based on the current configuration approach:

metrics:
  - name: promitor_demo_postgresql_simple
    description: "Availability (%) of promitor.io measured in Azure Application Insights"
    resourceType: PostgreSql
    azureMetricConfiguration:
      metricName: cpu_percent
      aggregation:
        type: Average
    resources:
      # Application Insights with data in the service itself (classic, deprecated)
    - serverName: fezfze3
  - name: promitor_demo_postgresql_flex
    description: "Availability (%) of promitor.io measured in Azure Application Insights"
    resourceType: PostgreSql
    azureMetricConfiguration:
      metricName: cpu_percent
      aggregation:
        type: Average
    resources:
      # Application Insights with data in the service itself (classic, deprecated)
    - serverName: fezfze2
      type: Flexible
  - name: promitor_demo_postgresql_hyperscale
    description: "Availability (%) of promitor.io measured in Azure Application Insights"
    resourceType: PostgreSql
    azureMetricConfiguration:
      metricName: cpu_percent
      aggregation:
        type: Average
    resources:
      # Application Insights with data in the service itself (classic, deprecated)
    - serverName: fezfze
      type: Hyperscale
  - name: promitor_demo_postgresql_discovered
    description: "Availability (%) of promitor.io measured in Azure Application Insights"
    resourceType: PostgreSql
    azureMetricConfiguration:
      metricName: cpu_percent
      aggregation:
        type: Average
    resourceDiscoveryGroups:
      - name: postgres-databases

A few things to note:

Would this fit your needs?

One thing that I have doubts about is using the same scraper which keeps it simple, but if you want to do resource discovery you cannot filter on the type which can be a pain given the metric surface is not identical across all resources.

That said, we can add resource type exclusion later on as we're introducing that for resource discovery in the future.

Thoughts?

LukaszSzalek-TomTom commented 2 years ago

Hi Tom

Please tell me if for Flexes code

searches for server named fezfze2 only or it will find as well servers like fezfze2-3, my-fezfze2-98, etc?(so any name contained fezfze2 string)

With discovery

It will search all databases of Postgres in subscription, right? Anything else should be enabled to use it?

Best regards

Łukasz Szałek / Expert Software Engineer / CPP CMR / DBLAB-SOJUZ

tomkerkhove commented 2 years ago

searches for server named fezfze2 only or it will find as well servers like fezfze2-3, my-fezfze2-98, etc?(so any name contained fezfze2 string)

It will only scrape fezfze2 as specified.

It will search all databases of Postgres in subscription, right?

Yes, all flavors of PostgreSQL

LukaszSzalek-TomTom commented 2 years ago

If only scrapes specific name, not regex it is not my case and will not help me The only way will be to use discovery?

tomkerkhove commented 2 years ago

If only scrapes specific name, not regex it is not my case and will not help me The only way will be to use discovery?

You can use resource discovery to automatically discover resources. Everything that is declared in YAML is static, because it's using metrics as code.

You can learn about the ways to discover on https://docs.promitor.io/configuration/v2.x/resource-discovery but basically it supports the following for now:

Name filters (ie regex) is another one that can be considered but is not supported today. Azure Tags are typically the best approach.

But your current question is beyond the scope of this issue as this is purely for scraping for that service, not how to find resources for it so suggest to open a discussion.

LukaszSzalek-TomTom commented 2 years ago

I need guidance

resourceDiscoveryGroups:

What I need in name/type for flexes?

LukaszSzalek-TomTom commented 2 years ago

I did

version: v1 azureMetadata: tenantId: * subscriptionId: resourceGroupName: promitor cloud: Global metricDefaults: azureTags: include:

Results Failed to scrape resource collection flexes_rdg: Name does not resolve

tomkerkhove commented 2 years ago

I think it's best to go through the documentation as you seem to be mixing the configuration for our scraper & resource discovery agent.

https://promitor.io/concepts/ https://docs.promitor.io/

This feature is not completed yet (hence why the issue is still open) and thus you won't be able to run this yourself already.

LukaszSzalek-TomTom commented 2 years ago

Hi I am trying to setup Promitor discovery

And I see Azure Landscape │ Success │ Everything is well-configured. ┃ ┃ Resource Discovery Groups │ Success │ Everything is well-configured. ┃ ┃ Azure Authentication │ Failed │ Validation failed: ┃ ┃ │ │ Azure authentication is not configured ┃ ┃ │ │ correctly - No identity secret was ┃ ┃ │ │ configured for service principle ┃ ┃ │ │ authentication

Config @.***

What is the “key” to setup the secret Tried this style azureAuthentication: appId: ** appKey: *** no success

this neither azureAuthentication: appId: "" # [Deprecated] Prefer identity.id appKey: "" # [Deprecated] Prefer identity.key mode: "ServicePrincipal" identity: id: "" key: "" binding: ""

Thanks for suggestions

tomkerkhove commented 2 years ago

Can you please open a Q&A discussion as this is unrelated to this issue?