Connecting Grafana openid with Azure AD

rancher / opni

Multi Cluster Observability with AIOps

https://opni.io

Apache License 2.0

329 stars 52 forks source link

Connecting Grafana openid with Azure AD #1028

Open Kapsztajn opened 1 year ago

Kapsztajn commented 1 year ago

Hi, I'm currently trying to migrate from noauth to openid configuration in Grafana from Opni, but I'm having some difficulties with accessing cluster information in Grafana. Here is my value file which I use to helm:

gateway:
  hostname: opni.hidden.tech
  auth:
    provider: "openid"
    openid:
      discovery:
        path: "/.well-known/openid-configuration"
        issuer: "https://login.microsoftonline.com/6c9a8372-9a58-49c4-bc1f-f9c74378615b/v2.0"
      identifyingClaim: "email"
      clientID: "Hidden"
      clientSecret: "Hidden"
      scopes: ["openid", "profile", "email"]
      roleAttributePath: "admin"
opni-agent:
  kube-prometheus-stack:
    enabled: true
opni-prometheus-crd:
  enabled: false

I already added cluster to Opni with monitoring which work with noauth:

I have configured Roles and Role binding:

Still when I login to Grafana I'm getting errors and cannot see anything:

Am I doing something wrong or did I miss something? Also I'm not really sure what roleAttributePath does? What values should I provide there to get highest permissions?

Thanks for this tool and your time.

kralicky commented 1 year ago

Hi @Kapsztajn! The roleAttributePath field is what allows Grafana to translate the claims in ID tokens issued by your openid provider into the Grafana roles (Admin, Editor, Viewer). It's a bit cryptic but the docs are here for reference. If you check the Grafana pod logs, it should show you relevant error messages.

kralicky commented 1 year ago

Regarding what to put for roleAttributePath, one strategy is to configure your identity provider to attach a custom claim to ID tokens it issues to clients. For example, if you set the claim grafana_role to an array containing allowed Grafana roles (based on identity provider-specific configuration), your roleAttributePath could be something like: roleAttributePath: "contains(grafana_role[*], 'Admin') && 'Admin' || contains(grafana_role[*], 'Editor') && 'Editor' || 'Viewer'"

Kapsztajn commented 1 year ago

Hi @kralicky I got this error from Grafana pod: logger=context userId=2 orgId=1 uname=kamil.kwiaton@hostersi.pl t=2023-02-06T22:18:13.216315857Z level=error msg="Internal server error" error="[plugin.downstreamError] failed to query data: received empty response from prometheus" remote_addr=10.1.0.6 traceID= logger=context userId=2 orgId=1 uname=kamil.kwiaton@hostersi.pl t=2023-02-06T22:18:13.21646746Z level=error msg="Request Completed" method=POST path=/api/ds/query status=500 remote_addr=10.1.0.6 time_ms=204 duration=204.862654ms size=116 referer="https://grafana.hidden.tech/d/1e83e204be502391f69d3a826675d3df/infrastructure-overview?orgId=1&refresh=10s" handler=/api/ds/query

kralicky commented 1 year ago

Try turning on Grafana debug logs, then inspect the id token it obtains from your openid server. To change the log level, edit the MonitoringCluster object created by Opni and set

spec:
  grafana:
    config:
      log:
        level: debug

The debug logs should show info about the authentication decisions Grafana is making, as well as the id tokens (in plaintext, so redact any secrets before sharing)

Also, check for any unusual logs in the Opni Gateway logs when you log into grafana.

Kapsztajn commented 1 year ago

@kralicky Nothing strange in Opni Gateway. I enabled debug log level and got more info from Grafana:

logger=tsdb.prometheus t=2023-02-07T01:36:48.977274558Z level=debug msg="Sending query" start=2023-02-07T00:36:48.321Z end=2023-02-07T01:36:48.321Z step=15s query="label_replace(sum by(namespace, __tenant_id__) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate), \"cluster_id\", \"$1\", \"__tenant_id__\", \"(.*)\") * on(cluster_id) group_left(friendly_name) group without(pod, instance) (opni_cluster_info)"
logger=tsdb.prometheus t=2023-02-07T01:36:49.057392077Z level=error msg="Instant query failed" query="label_replace(sum by(namespace, __tenant_id__) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate), \"cluster_id\", \"$1\", \"__tenant_id__\", \"(.*)\") * on(cluster_id) group_left(friendly_name) group without(pod, instance) (opni_cluster_info)" err="client_error: client error: 401"
logger=context userId=2 orgId=1 uname=kamil.kwiaton@hostersi.pl t=2023-02-07T01:36:49.057625283Z level=info msg="Request Completed" method=POST path=/api/ds/query status=400 remote_addr=10.1.0.6 time_ms=199 duration=199.445077ms size=62 referer="https://grafana.hidden.tech/d/1e83e204be502391f69d3a826675d3df/infrastructure-overview?orgId=1&refresh=10s" handler=/api/ds/query
logger=tsdb.prometheus t=2023-02-07T01:36:49.057841389Z level=error msg="Range query failed" query="sum(rate(kubelet_runtime_operations_errors_total{job=\"kubelet\",}[2m15s])) by (__tenant_id__, operation_type) * on(__tenant_id__) group_left(friendly_name) label_replace(group without(pod, instance) (opni_cluster_info), \"__tenant_id__\", \"$1\", \"cluster_id\", \"(.*)\")" err="client_error: client error: 401"
logger=context userId=2 orgId=1 uname=kamil.kwiaton@hostersi.pl t=2023-02-07T01:36:49.057919091Z level=info msg="Request Completed" method=POST path=/api/ds/query status=400 remote_addr=10.1.0.6 time_ms=301 duration=301.419074ms size=62 referer="https://grafana.hidden.tech/d/1e83e204be502391f69d3a826675d3df/infrastructure-overview?orgId=1&refresh=10s" handler=/api/ds/query
logger=tsdb.prometheus t=2023-02-07T01:36:49.065603494Z level=debug msg="Sending query" start=2023-02-07T00:36:48.322Z end=2023-02-07T01:36:48.322Z step=15s query="label_replace(sum by(namespace, __tenant_id__) (node_namespace_pod_container:container_memory_rss), \"cluster_id\", \"$1\", \"__tenant_id__\", \"(.*)\") * on(cluster_id) group_left(friendly_name) group without(pod, instance) (opni_cluster_info)"
logger=tsdb.prometheus t=2023-02-07T01:36:49.067280439Z level=error msg="Instant query failed" query="label_replace(sum by(namespace, __tenant_id__) (node_namespace_pod_container:container_memory_rss), \"cluster_id\", \"$1\", \"__tenant_id__\", \"(.*)\") * on(cluster_id) group_left(friendly_name) group without(pod, instance) (opni_cluster_info)" err="client_error: client error: 401"
logger=context userId=2 orgId=1 uname=kamil.kwiaton@hostersi.pl t=2023-02-07T01:36:49.067361541Z level=info msg="Request Completed" method=POST path=/api/ds/query status=400 remote_addr=10.1.0.6 time_ms=160 duration=160.599048ms size=62 referer="https://grafana.hidden.tech/d/1e83e204be502391f69d3a826675d3df/infrastructure-overview?orgId=1&refresh=10s" handler=/api/ds/query

This 401 errors is curious to me. Can I set somehow in Grafana config that all users are by default Admin level not Viewer? When I try to edit configmap directly it won't change. Maybe this is an issue that I can't configure roleAttributePath.

Is Role Binding correctly setup with my email if I have identifyingClaim as email?

kralicky commented 1 year ago

Yeah looks like it could be an rbac issue. If your rbac is set up such that a given user has access to no clusters, it might return a 401 error when querying metrics.

If identifyingClaim is email, check that the Subject in your role binding is kamil.kwiaton@hostersi.pl. You can exec into the gateway pod and run opni access-matrix to show the full permissions table for all defined users.

kralicky commented 1 year ago

You might also need to adjust roleAttributePath, but technically it shouldn't stop you from viewing metrics if you aren't an admin.

Kapsztajn commented 1 year ago

bash-5.1# opni access-matrix
               TENANT ID               76dcd1c7-e589-489c-9bca-f0552fcf2175  kamil.kwiaton@hostersi.pl 
 0d0d1d66-a94b-40b4-90d4-8c2af0082d21                   ✅                               ✅        
 2991968b-91b7-4b33-9566-4469a5f494a0                   ✅                               ✅       
 40ea968b-86a0-4e26-af63-b5f3a0df04a7                   ✅                               ✅     
 4cc83bd0-4a8d-4b97-a40c-03eda158c32e                   ✅                               ✅

I checked the access matrix and I have my email with all clusters assigned. I also tried with this 76dcd1c7-e589-489c-9bca-f0552fcf2175 after changing values.yaml to identifyingClaim: "oid" as this string is my account identifier in AzureAD but the same 401 error in grafana

grafana logger=tsdb.prometheus t=2023-02-10T13:20:48.164082341Z level=error msg="Range query failed" query="1 - (:node_memory_MemAvailable_bytes:sum / on(__tenant_id__) sum by(__tenant_id__) (node_memory_MemTotal_bytes)) * on(__tenant │
│ _id__) group_left(friendly_name) label_replace(group without(pod, instance) (opni_cluster_info), \"__tenant_id__\", \"$1\", \"cluster_id\", \"(.*)\")" err="client_error: client error: 401"                                               │
│ grafana logger=auth t=2023-02-10T13:20:48.164133442Z level=debug msg="token needs rotation" tokenId=2 authTokenSeen=true rotatedAt=2023-02-10T13:10:48Z                                                                                    │
│ grafana logger=tsdb.prometheus t=2023-02-10T13:20:48.177085407Z level=debug msg="Sending query" start=2023-02-10T12:20:47.597Z end=2023-02-10T13:20:47.597Z step=2m0s query="sum(rate(kubelet_runtime_operations_errors_total{job=\"kubele │
│ t\",}[2m15s])) by (__tenant_id__, operation_type) * on(__tenant_id__) group_left(friendly_name) label_replace(group without(pod, instance) (opni_cluster_info), \"__tenant_id__\", \"$1\", \"cluster_id\", \"(.*)\")"                      │
│ grafana logger=auth t=2023-02-10T13:20:48.178365643Z level=debug msg="auth token rotated" affected=1 auth_token_id=2 userId=2                                                                                                              │
│ grafana logger=context userId=2 orgId=1 uname=kamil.kwiaton@hostersi.pl t=2023-02-10T13:20:48.178461545Z level=info msg="Request Completed" method=POST path=/api/ds/query status=400 remote_addr=10.1.0.5 time_ms=422 duration=422.060876 │
│ ms size=112 referer="https://grafana.hidden.tech/d/1e83e204be502391f69d3a826675d3df/infrastructure-overview?orgId=1&refresh=10s" handler=/api/ds/query                                                                                    │
│ grafana logger=tsdb.prometheus t=2023-02-10T13:20:48.178873957Z level=error msg="Range query failed" query="sum(rate(kubelet_runtime_operations_errors_total{job=\"kubelet\",}[2m15s])) by (__tenant_id__, operation_type) * on(__tenant_i │
│ d__) group_left(friendly_name) label_replace(group without(pod, instance) (opni_cluster_info), \"__tenant_id__\", \"$1\", \"cluster_id\", \"(.*)\")" err="client_error: client error: 401"                                                 │
│ grafana logger=auth t=2023-02-10T13:20:48.178933259Z level=debug msg="token needs rotation" tokenId=2 authTokenSeen=true rotatedAt=2023-02-10T13:10:48Z
│ grafana logger=context userId=2 orgId=1 uname=kamil.kwiaton@hostersi.pl t=2023-02-10T13:20:48.179211566Z level=info msg="Request Completed" method=POST path=/api/ds/query status=400 remote_addr=10.1.0.5 time_ms=122 duration=122.822254 │
│ ms size=62 referer="https://grafana.hidden.tech/d/1e83e204be502391f69d3a826675d3df/infrastructure-overview?orgId=1&refresh=10s" handler=/api/ds/query                                                                                     │
│ grafana logger=auth t=2023-02-10T13:20:48.183561189Z level=debug msg="auth token rotated" affected=0 auth_token_id=2 userId=2

For me, it looks like identifyingClaim: "email" or any other value here is not working correctly with AAD. In what openid provider you tested this so maybe I will change Azure AD to that?

Kapsztajn commented 1 year ago

Some more log during login to Grafana:

 grafana logger=oauth.generic_oauth t=2023-02-10T15:10:19.512778773Z level=debug msg="Getting user info"                                                                                                                                    
 grafana logger=oauth.generic_oauth t=2023-02-10T15:10:19.512785973Z level=debug msg="Extracting user info from OAuth token"                                                                                                                
 grafana logger=oauth.generic_oauth t=2023-02-10T15:10:19.513107373Z level=debug msg="Received id_token" raw_json="{\"aud\":\"0d70811a-01a8-4908-8066-9e74edcc2f59\",\"iss\":\"https://login.microsoftonline.com/HIDDEN/v2.0\",\"iat\":1676041519,\"nbf\":1676041519,\"exp\":1676045419,\"aio\":\"AeQAG/8TAAAAO0oBDzXhPRggyL2Ij6raCr2T06IBqJAEtRZuLM/7gFw9aR+F8O2NicMXUwTiCubtmjdDcJDz1Y3UcCPr6kG2RbR815ZEKTu7RK1dBw2cYcA5xbFYbyGNP3SoqeLf+UMj4rCJsfFi5 
 U0stvVvoqZQwol1Nci6cqc43ODeRGQcbO+ynda/oF1LOqHZZvxEOpiga5PZTYlAJX42TrVJES6n3Cr44Kod5wjG7JYyH8uNJMFLEixGRfCw8qyigw7KwgWgE7tRNPscV2sKow5xYeIEb4M7/l4QJSDBkXoQUgOYqTQ=\",\"email\":\"kamil.kwiaton@hostersi.pl\",\"idp\":\"https://sts.window 
 s.net/b37f6912-3cfa-4041-b867-5ff20368f029/\",\"name\":\"Kamil Kwiaton\",\"oid\":\"76dcd1c7-e589-489c-9bca-f0552fcf2175\",\"preferred_username\":\"kamil.kwiaton@hostersi.pl\",\"rh\":\"0.AXkAiUIh5jPdcEyDPIUi4d_s0xqBcA2oAQhJgGaedO3ML1l5 
 AKc.\",\"sub\":\"d75bDesL3N2B_LM-OaP_5AyrAv5k4Gl8t9K8hHj63q8\",\"tid\":\"e6214289-dd33-4c70-833c-8522e1dfecd3\",\"uti\":\"d8kVk9waS0-PL8MuCkQUAA\",\"ver\":\"2.0\"}" data="Name: Kamil Kwiaton, Displayname: , Login: , Username: , Email: 
  kamil.kwiaton@hostersi.pl, Upn: , Attributes: map[]"                                                                                                                                                                                      
 grafana logger=oauth.generic_oauth t=2023-02-10T15:10:19.513133173Z level=debug msg="Getting user info from API"                                                                                                                           
 grafana logger=oauth.generic_oauth t=2023-02-10T15:10:19.588388838Z level=debug msg="HTTP GET" url=https://graph.microsoft.com/oidc/userinfo status="200 OK" response_body="{\"sub\":\"d75bDesL3N2B_LM-OaP_5AyrAv5k4Gl8t9K8hHj63q8\",\"nam 
 e\":\"Kamil Kwiaton\",\"family_name\":\"Kwiaton\",\"given_name\":\"Kamil\",\"picture\":\"https://graph.microsoft.com/v1.0/me/photo/$value\",\"email\":\"kamil.kwiaton@hostersi.pl\"}"                                                      
 grafana logger=oauth.generic_oauth t=2023-02-10T15:10:19.588453138Z level=debug msg="Received user info response from API" raw_json="{\"sub\":\"d75bDesL3N2B_LM-OaP_5AyrAv5k4Gl8t9K8hHj63q8\",\"name\":\"Kamil Kwiaton\",\"family_name\":\ 
 "Kwiaton\",\"given_name\":\"Kamil\",\"picture\":\"https://graph.microsoft.com/v1.0/me/photo/$value\",\"email\":\"kamil.kwiaton@hostersi.pl\"}" data="Name: Kamil Kwiaton, Displayname: , Login: , Username: , Email: kamil.kwiaton@hosters 
 i.pl, Upn: , Attributes: map[]"                                                                                                                                                                                                            
 grafana logger=oauth.generic_oauth t=2023-02-10T15:10:19.588464738Z level=debug msg="Processing external user info" source=token data="Name: Kamil Kwiaton, Displayname: , Login: , Username: , Email: kamil.kwiaton@hostersi.pl, Upn: , A 
 ttributes: map[]"                                                                                                                                                                                                                          
 grafana logger=oauth.generic_oauth t=2023-02-10T15:10:19.588475638Z level=debug msg="Setting user info name from name field"                                                                                                               
 grafana logger=oauth.generic_oauth t=2023-02-10T15:10:19.588482438Z level=debug msg="Set user info email from extracted email" email=kamil.kwiaton@hostersi.pl                                                                             
 grafana logger=oauth.generic_oauth t=2023-02-10T15:10:19.588564638Z level=warn msg="No valid role found. Skipping role sync. In Grafana 10, this will result in the user being assigned the default role and overriding manual assignment. 
  If role sync is not desired, set oauth_skip_org_role_update_sync to false"                                                                                                                                                                
 grafana logger=oauth.generic_oauth t=2023-02-10T15:10:19.588575338Z level=debug msg="Processing external user info" source=API data="Name: Kamil Kwiaton, Displayname: , Login: , Username: , Email: kamil.kwiaton@hostersi.pl, Upn: , Att 
 ributes: map[]"                                                                                                                                                                                                                            
 grafana logger=oauth.generic_oauth t=2023-02-10T15:10:19.588612438Z level=warn msg="No valid role found. Skipping role sync. In Grafana 10, this will result in the user being assigned the default role and overriding manual assignment. 
  If role sync is not desired, set oauth_skip_org_role_update_sync to false"                                                                                                                                                                
 grafana logger=oauth.generic_oauth t=2023-02-10T15:10:19.588620938Z level=debug msg="Defaulting to using email for user info login" email=kamil.kwiaton@hostersi.pl

As you can see atributes from Azure AD are passed correctly, sub, email etc.

Logs from opni gateway when I login to grafana:

2023-02-10T15:25:18.570Z DEBUG x17 api fwd/forwarder.go:100 => {"method": "POST", "path": "/api/prom/api/v1/query", "to": "127.0.0.1:40807 (plugin_metrics)", "for": "10.244.2.63", "host": "opni-internal.opni.svc:8080", "scheme": "https"}

kralicky commented 1 year ago

Grafana appears to be working so far. What does your auth configuration look like? Look for a section like this in the opni-gateway configmap:

auth:
  provider: openid
  openid:
    discovery:
      issuer: https://xxx/
    identifyingClaim: email
    clientID: xxx
    clientSecret: xxx
    scopes: ["openid", "profile", "email"]
    roleAttributePath: "contains(opni_grafana_role[*], 'Admin') && 'Admin' || contains(opni_grafana_role[*], 'Editor') && 'Editor' || 'Viewer'"

Kapsztajn commented 1 year ago

I think in opni-gateway configmap I have only this part:

    ---
    apiVersion: v1beta1
    kind: AuthProvider
    metadata:
      name: openid
    spec:
      options:
        discovery:
          issuer: https://login.microsoftonline.com/tenant_id/v2.0
          path: /.well-known/openid-configuration
        identifyingClaim: sub
      type: openid

I have set identifyingClaim to sub again for testing, I can switch it back to email.

Kapsztajn commented 1 year ago

Auth is in one more place in that configmap but only one line:

Data                                                                                                                                                                                                                                        ====                                                                                                                                                                                                                                        config.yaml:                                                                                                                                                                                                                               │
 ----                                                                                                                                                                                                                                       
 apiVersion: v1beta1                                                                                                                                                                                                                        
 kind: GatewayConfig                                                                                                                                                                                                                        
 spec:                                                                                                                                                                                                                                      
   alerting:                                                                                                                                                                                                                                
     Namespace: opni                                                                                                                                                                                                                        
     configMap: alertmanager-config                                                                                                                                                                                                         
     controllerNodeService: opni-alerting-controller                                                                                                                                                                                        
     controllerStatefulSet: opni-alerting-controller-internal                                                                                                                                                                               
     workerNodeService: opni-alerting                                                                                                                                                                                                       
     workerStatefulSet: opni-alerting-internal                                                                                                                                                                                              
   authProvider: openid
   certs:

kralicky commented 1 year ago

To confirm, if you go to https://login.microsoftonline.com/<your_id>/v2.0/.well-known/openid-configuration, everything look ok there?

kralicky commented 1 year ago

Also yeah your configmap looks correct, I copied the wrong one earlier. The one I pasted should be in the Gateway custom resource, and only some of the fields are copied into the configmap (only the ones needed to verify id tokens)

Kapsztajn commented 1 year ago

I think yes? This is the default link MS provides: https://login.microsoftonline.com/common/v2.0/.well-known/openid-configuration

Theoretically, I can configure that myself in wellKnownConfiguration if you think that would help.

kralicky commented 1 year ago

Looks correct to me. Can you check the logs for the cortex-querier pods for any auth related errors?

kralicky commented 1 year ago

You can also get some additional status info by running opni metrics admin status, opni metrics admin list-clusters, opni metrics admin storage-info <cluster id> from a shell inside the opni-gateway pod.

Kapsztajn commented 1 year ago

I don't see a cortex-querier pod at all only cortex-all-0

kralicky commented 1 year ago

If you install metrics in standalone mode you'll only get one pod, that's normal. Do you see any interesting logs in Cortex when grafana sends queries?

Kapsztajn commented 1 year ago

I decided to reinstall the whole Opni cause I had an old version of 0.6.3 and didn't bother with the upgrade. Problem persists. opni metrics admin status

bash-5.1# opni metrics admin status
 Cortex Services                                                                                                                                                                                                
                compactor  distributor-service  ingester-service  memberlist-kv  querier  query-frontend  query-frontend-tripperware  ring     ruler    runtime-config  server   store-gateway  store-queryable 
     Compactor  Running    Running              Running           Running        Running  Running         Running                     Running  Running  Running         Running  Running        Running         
   Distributor  Running    Running              Running           Running        Running  Running         Running                     Running  Running  Running         Running  Running        Running         
      Ingester  Running    Running              Running           Running        Running  Running         Running                     Running  Running  Running         Running  Running        Running         
        Purger  Running    Running              Running           Running        Running  Running         Running                     Running  Running  Running         Running  Running        Running         
       Querier  Running    Running              Running           Running        Running  Running         Running                     Running  Running  Running         Running  Running        Running         
         Ruler  Running    Running              Running           Running        Running  Running         Running                     Running  Running  Running         Running  Running        Running         
 Store Gateway  Running    Running              Running           Running        Running  Running         Running                     Running  Running  Running         Running  Running        Running         

 Ingester Ring                                                          
 ID            STATE   ADDRESS            TIMESTAMP                     
 cortex-all-0  ACTIVE  10.244.2.117:9095  2023-02-13 12:12:39 +0000 UTC 

 Ruler Ring                                                             
 ID            STATE   ADDRESS            TIMESTAMP                     
 cortex-all-0  ACTIVE  10.244.2.117:9095  2023-02-13 12:12:39 +0000 UTC

opni metrics admin list-clusters

bash-5.1# opni metrics admin list-clusters
 ID                                    LABELS                                                              CAPABILITIES  STATUS   NUM SERIES  SAMPLE RATE  RULE RATE 
 2991968b-91b7-4b33-9566-4469a5f494a0  opni.io/name=hidden,opni.io/agent-version=v2           metrics       Healthy  62736       2160.5/s     30.7/s    
 40ea968b-86a0-4e26-af63-b5f3a0df04a7  opni.io/name=dev-test-aks,cluster=hidden,opni.io/agent-version=v2  metrics       Healthy  123271      12277.2/s    54.4/s

opni metrics admin storage-info

bash-5.1# opni metrics admin storage-info 2991968b-91b7-4b33-9566-4469a5f494a0

bash-5.1# opni metrics admin storage-info 40ea968b-86a0-4e26-af63-b5f3a0df04a7
 NAMESPACE  CLUSTER  BLOCKS

Logs from opni-gateway when I log in:

│ 2023-02-13T12:50:04Z DEBUG apiext management/extensions.go:236 handling http request {"method": "GetClusterStatus", "path": "/status"}                                                                                                     │
│ 2023-02-13T12:50:29Z DEBUG x16 api fwd/forwarder.go:100 => {"method": "POST", "path": "/api/prom/api/v1/query", "to": "127.0.0.1:32895 (plugin_metrics)", "for": "10.244.2.118", "host": "opni-internal.opni.svc:8080", "scheme": "https"} │
│ 2023-02-13T12:50:36Z DEBUG x17 api fwd/forwarder.go:100 => {"method": "POST", "path": "/api/prom/api/v1/query_range", "to": "127.0.0.1:32895 (plugin_metrics)", "for": "10.244.2.118", "host": "opni-internal.opni.svc:8080", "scheme": "h │
│ 2023-02-13T12:50:40Z DEBUG gateway.sync gateway/sync.go:86 sending sync request to agent {"agentId": "2991968b-91b7-4b33-9566-4469a5f494a0", "capabilities": []}                                                                           │
│ 2023-02-13T12:50:45Z INFO plugin.logging.opensearch-manager gateway/admin_v2.go:971 waiting for k8s object                                                                                                                                 │
│ 2023-02-13T12:50:45Z INFO plugin.modeltraining gateway/system.go:67 waiting for k8s object                                                                                                                                                 │
│ 2023-02-13T12:50:46Z DEBUG x17 api fwd/forwarder.go:100 => {"method": "POST", "path": "/api/prom/api/v1/query", "to": "127.0.0.1:32895 (plugin_metrics)", "for": "10.244.2.118", "host": "opni-internal.opni.svc:8080", "scheme": "https"}

There are no logs in cortex-all when I log in to Grafana. Only some post opni-alerting-controller

Can I test some other identity provider which you tested that works? Maybe there is some issue with AzureAD and Opni together?

kralicky commented 1 year ago

I've tested auth0 recently and confirmed that works. Unfortunately I don't have access to any Azure AD setups so I can't test that, but if there is a bug in opni preventing it from working I want to make sure we fix it. Theoretically there shouldn't be anything preventing it from working, as long as it conforms to the openid standards.

If you want, you can join the rancher-users slack (link) and I could help you troubleshoot over a call. Otherwise there are a few other things you can try:

Run opni metrics admin query --clusters=all "any promql query" (try "up")
Try switching cortex to HA mode. It's not enabled in the UI but you can configure HA with azure blob storage from the CLI via opni metrics ops configure --mode=HighlyAvailable --storage.backend=azure --storage.azure.account-key=xxx --storage.azure.account-name=xxx --storage.azure.container-name=xxx

kralicky commented 1 year ago

After working with @Kapsztajn to debug this issue, we discovered that Azure AD might not be OIDC compliant. Will follow up in this thread: https://github.com/MicrosoftDocs/azure-docs/issues/38427