grafana-toolbox / grafana-wtf

Grep through all Grafana entities in the spirit of git-wtf.
GNU Affero General Public License v3.0
138 stars 12 forks source link

Add subcommands `plugins {list,status}`, to inquire plugins #91

Closed amotl closed 9 months ago

amotl commented 9 months ago

About

At https://github.com/panodata/grafana-client/pull/110, @bhks added a few more API wrapper functions for Grafana. Thanks! This patch wraps them once more into the command line interface of grafana-wtf.

Synopsis

# Explore plugins.
grafana-wtf plugins list
grafana-wtf plugins status
amotl commented 9 months ago

Problem

@bhks: Even with the most recent Grafana 10.1.2 release, grafana-wtf plugins status does not work well on my machine. It fails to inquire the corresponding health check and metrics endpoints.

2023-09-20 23:09:47,850 [grafana_wtf.core                    ] INFO   : Health check failed: Server Error 503: Plugin unavailable
2023-09-20 23:09:47,895 [grafana_client.elements.plugin      ] INFO   : Got error in fetching metrics for plugin satellogic-3d-globe-panel and error = Server Error 503: Plugin unavailable
2023-09-20 23:09:47,896 [grafana_wtf.core                    ] INFO   : Metrics inquiry failed: get_plugin_metrics returned nothing

Thoughts

Q&A

You can install the package including this feature directly from the corresponding branch using this pip command, in order to check if it works on your end. I will be happy to hear back about the outcome.

pip install --upgrade 'git+https://github.com/panodata/grafana-wtf@list-plugins'
bhks commented 9 months ago

Thanks for pinging me into this thread.

The panel plugin does not have any server running which can give us health or metrics from the backend. Also not all of the plugin servers implement this.

The After plugin sdk kind of enforces for plugin developers to implement the protobuff model from version-1, so I don't know if all of the marketplace plugins of type datasource or App follows those.

bhks commented 9 months ago

I can try this out when I get a chance and see what works and what not.

bhks commented 9 months ago

Not all servers/plugins have implemented the metric endpoint, this was I believe introduced in the new SDK which helps build an external plugin.

Also all core plugins does not have a server so they are built within grafana golang code , they don't have health or metric endpoint. So we need to filter them out as well.

amotl commented 9 months ago

Thanks for your response. According to grafana-wtf plugins list, a representation looks like this.

{
    "name": "Alert list",
    "type": "panel",
    "id": "alertlist",
    "enabled": true,
    "pinned": false,
    "info": {
        "author": {
            "name": "Grafana Labs",
            "url": "https://grafana.com"
        },
        "description": "Shows list of alerts and their current status",
        "links": null,
        "logos": {
            "small": "public/app/plugins/panel/alertlist/img/icn-singlestat-panel.svg",
            "large": "public/app/plugins/panel/alertlist/img/icn-singlestat-panel.svg"
        },
        "build": {},
        "screenshots": null,
        "version": "",
        "updated": ""
    },
    "dependencies": {
        "grafanaDependency": "",
        "grafanaVersion": "*",
        "plugins": []
    },
    "latestVersion": "",
    "hasUpdate": false,
    "defaultNavUrl": "/grafana",
    "category": "",
    "state": "",
    "signature": "internal",
    "signatureType": "",
    "signatureOrg": ""
}

Also all core plugins does not have a server so they are built within grafana golang code , they don't have health or metric endpoint. So we need to filter them out as well.

Would skipping all plugins having "signature": "internal" on health and metrics inquiry a good option to proceed with here?

bhks commented 9 months ago

Would skipping all plugins having "signature": "internal" on health and metrics inquiry a good option to proceed with here?

Exactly that and the following one as well

"type": "panel",
amotl commented 9 months ago

Maybe just including the items with "type": "datasource" would be the right choice, not bothering about skipping certain others like "signature": "internal" and "type": "panel" at all?

amotl commented 9 months ago

I've amended the patch to only use if item.type == "datasource" at this spot, significantly reducing unneccessary probes, and I think it works well so far. Thank you very much.

bhks commented 9 months ago

But internal/core plugins will not be able to respond like Cloudwatch, prometheus, rds they are datasource type.

Also the App Plugin do respond to these endpoints like api/plugins/aws-datasource-provisioner-app/health

{
  "message": "",
  "status": "OK"
}

Similarly for metrics api/plugins/aws-datasource-provisioner-app/metrics

# HELP go_sync_mutex_wait_total_seconds_total Approximate cumulative time goroutines have spent blocked on a sync.Mutex or sync.RWMutex. This metric is useful for identifying global changes in lock contention. Collect a mutex or block profile using the runtime/pprof package for more detailed contention data.
# TYPE go_sync_mutex_wait_total_seconds_total counter
go_sync_mutex_wait_total_seconds_total 0
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 8
# HELP grpc_server_msg_received_total Total number of RPC stream messages received on the server.
# TYPE grpc_server_msg_received_total counter
grpc_server_msg_received_total{grpc_method="CollectMetrics",grpc_service="pluginv2.Diagnostics",grpc_type="unary"} 1
grpc_server_msg_received_total{grpc_method="StreamStdio",grpc_service="plugin.GRPCStdio",grpc_type="server_stream"} 1
# HELP grpc_server_started_total Total number of RPCs started on the server.
# TYPE grpc_server_started_total counter
grpc_server_started_total{grpc_method="CollectMetrics",grpc_service="pluginv2.Diagnostics",grpc_type="unary"} 1
grpc_server_started_total{grpc_method="StartStream",grpc_service="plugin.GRPCBroker",grpc_type="bidi_stream"} 1
grpc_server_started_total{grpc_method="StreamStdio",grpc_service="plugin.GRPCStdio",grpc_type="server_stream"} 1
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 2.7
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 65535
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 13
amotl commented 9 months ago

Thanks. I've added if item.type == "datasource" and item.signature != "internal" to my working tree, and it significantly reduces unneccessary probes further.

bhks commented 9 months ago

Do you mind updating that if with following :

if item.type != "panel" and item.signature != "internal"
amotl commented 9 months ago

Do you mind updating that if with following :

if item.type != "panel" and item.signature != "internal"

Can you convince me why this is better? To me, it sounds more inadequate, because we are mostly talking about "datasource" plugins here? Are there any other types of plugins which yield sensible responses on their metrics or health endpoints?

If so, can you provide a sample (plugin id) of that kind, so I can use it on behalf of a corresponding software test case? Thanks!

bhks commented 9 months ago

Can you convince me why this is better? To me, it sounds more inadequate, because we are mostly talking about "datasource" plugins here? Are there any other types of plugins which yield sensible responses on their metrics or health endpoints?

If so, can you provide a sample (plugin id) of that kind, so I can use it on behalf of a corresponding software test case?

Apologies about being too specific here.

May be the example plugin id in comment can help :https://github.com/panodata/grafana-wtf/pull/91#issuecomment-1728543881

I was referring to a third type which is App plugin type which have data source as well as panel plugins bundled. Example like

aws-datasource-provisioner-app.

Here are multiple app type plugins we can explore , I am not 100% sure if they all have these endpoints : https://grafana.com/grafana/plugins/app-plugins/

amotl commented 9 months ago

aws-datasource-provisioner-app works well, and provides both health and metrics. Thank you!


[
    {
        "name": "AWS Data Sources",
        "type": "app",
        "id": "aws-datasource-provisioner-app",
        "enabled": false,
        "category": "",
        "version": "1.13.0",
        "signature": "valid",
        "health": {
            "message": "",
            "status": "OK"
        },
        "metrics": "# HELP go_cgo_go_to_c_calls_calls_total Count of calls made from Go to C by the current process.\n# TYPE go_cgo_go_to_c_calls_calls_total counter\ngo_cgo_go_to_c_calls_calls_total 0\n# HELP go_cpu_classes_gc_mark_assist_cpu_seconds_total Estimated total CPU time goroutines spent performing GC tasks to assist the GC and prevent it from falling behind the application. This metric is an overestimate, and not directly ..."
    }
]
amotl commented 9 months ago