This repository is the primary public codecollection that is to be used within the RunWhen platform. It contains codebundles that can be used in SLIs, SLOs, and TaskSets.
Please see the contributing and code of conduct for details on adding your contributions to this project.
Documentation for each codebundle is maintained in the README.md alongside the robot code and is published at https://docs.runwhen.com/public/v/codebundles/. Please see the readme howto for details on crafting a codebundle readme that can be indexed.
Name | Supported Integrations | Tasks | Documentation |
---|---|---|---|
Kubernetes Namespace Healthcheck | Kubernetes , AKS , EKS , GKE , OpenShift |
Get Event Count and Score , Get Container Restarts and Score , Get NotReady Pods , Generate Namspace Score |
This SLI uses kubectl to score namespace health. Produces a value between 0 (completely failing thet test) and 1 (fully passing the test). Looks for container restarts, events, and pods not ready. Docs |
Kubernetes Namespace Troubleshoot | Kubernetes , AKS , EKS , GKE , OpenShift |
Trace Namespace Errors , Fetch Unready Pods , Triage Namespace , Object Condition Check , Namespace Get All |
This taskset runs general troubleshooting checks against all applicable objects in a namespace, checks error events, and searches pod logs for error entries. Docs |
Kubernetes Run Shell Command | Kubernetes , AKS , EKS , GKE , OpenShift |
Running Kubectl And Adding Stdout To Report |
This codebundle runs an arbitrary kubectl command and writes the stdout to a report. Typically used in conjunction with other codebundles. Docs |
Kubernetes Synthetic PVC Test | Kubernetes , AKS , EKS , GKE , OpenShift |
Run Canary Job |
Creates an adhoc one-shot job which mounts a PVC as a canary test, which is polled for success before being torn down. Docs |
Kubernetes Workload Metric | Kubernetes , AKS , EKS , GKE , OpenShift |
Running Kubectl get and push the metric |
This codebundle runs a kubectl get command that produces a value and pushes the metric. Uses jmespath for filtering and allows calculations such as count, sum, avg on specified fields. Docs |
argocd-healthcheck-sli | argocd |
ArgoCD Health Check |
Check the health of ArgoCD platfrom by checking the availability of its underlying Deployments and StatefulSets. Docs |
artifactory-ok-sli | artifactory |
Check If Artifactory Endpoint Is Healthy |
Checks an Artifactory instance health endpoint to determine its operational status. The response is parsed to determine if the service is healthy, resulting in a metric of 1 if it is, or 0 if not. Docs |
aws-account-limit-sli | aws |
Get Count Of AWS Accounts In Organization |
Retrieve the count of all AWS accounts in an organization. Docs |
aws-account-limit-taskset | aws , iam |
Get The Recently Created AWS Accounts |
Retrieve all recently created AWS accounts. Docs |
aws-billing-costsacrosstags-taskset | aws , billing , costexplorer |
Get All Billing Sliced By Tags |
Creates a report of AWS line item costs filtered to a list of tagged resources Docs |
aws-billing-tagcosts-sli | aws , billing , costexplorer |
Get All Billing Sliced By Tags |
Monitors AWS cost and usage data for the latest billing period. Accepts one tag for continuous monitoring. Docs |
aws-cloudformation-stackevents-count-sli | aws , cloudformation |
Fetch CloudFormation Stack Events |
Retrieve the number of detected AWS CloudFormation stack events over a given history Docs |
aws-cloudformation-triage-taskset | aws , cloudformation |
Get All Recent Stack Events |
Triage and troubleshoot various issues with AWS CloudFormation Docs |
aws-cloudwatch-logquery-rowcount-zeroerror-sli | aws , cloudwatch |
Running CloudWatch Log Query And Pushing 1 If No Results Found |
Retrieve binary result from an AWS CloudWatch Insights query. Pushes 0 (success) if logs are found (activity) or 1 if no logs were found in the time window. Docs |
aws-cloudwatch-logquery-sli | aws , cloudwatch |
Running CloudWatch Log Query And Pushing The Count Of Results |
Retrieve number of results from an AWS CloudWatch Insights query. Docs |
aws-cloudwatch-metricquery-dashboard-taskset | aws , cloudwatch |
Get CloudWatch MetricQuery Insights URL |
Creates a URL to a AWS CloudWatch metrics dashboard with a running query. Docs |
aws-cloudwatch-metricquery-sli | aws , cloudwatch |
Running CloudWatch Metric Query And Pushing The Result |
Retrieve the result of an AWS CloudWatch Metrics Insights query. Docs |
aws-cloudwatch-tagmetricquery-sli | aws , cloudwatch |
Run CloudWatch Metric Query Across Set Of IDs And Push Metric |
Retrieve aggregate results from multiple AWS Cloudwatch Metrics Insights queries ran against tagged resources. This codebundle fetches a list of instance IDs filtered by tags, and uses them to run a set of AWS metric queries against the CloudWatch metrics insights API and pushes an aggregated/transformed value provided by the API as a metric. Docs |
aws-ec2-securitycheck-taskset | aws , ec2 , cloudwatch |
Check For Untagged instances , Check For Dangling Volumes , Check For Open Routes , Check For Overused Instances , Check For Underused Instances , Check For Underused Volumes , Check For Overused Volumes |
Performs a suite of security checks against a set of AWS EC2 instances. Checks include untagged instances, dangling volumes, open routes. Docs |
aws-s3-stalecheck-taskset | aws , s3 , bucket |
Create Report For Stale Buckets |
Identify stale AWS S3 buckets, based on last modified object timestamp. Docs |
aws-vm-triage-taskset | aws , ec2 , cloudwatch |
Get Max VM CPU Utilization In Last 3 Hours , Get Lowest VM CPU Credits In Last 3 Hours , Get Max VM CPU Credit Usage In Last 3 hours , Get Max VM Memory Utilization In Last 3 Hours , Get Max VM Volume Usage In Last 3 Hours |
Triage and troubleshoot performance and usage of an AWS EC2 instance Docs |
cert-manager-expirations-sli | cert |
Inspect Certification Expiration Dates |
Retrieve number of expired TLS certificates managed by cert-manager within a given window. The metric pushed is the number of certs within the configured expiration window. Docs |
cert-manager-healthcheck-sli | cert |
Health Check cert-manager Pods |
Check the health of pods deployed by cert-manager. Docs |
curl-generic-sli | curl |
Run Curl Command and Push Metric |
A curl SLI for querying and extracting data from a generic curl call. Supports jq. Should prodice a single metric. Docs |
curl-generic-taskset | curl |
Run Curl Command and Add to Report |
A curl TaskSet for querying and extracting data from a generic curl call. Supports jq. Adds results to the report. Docs |
datadog-metricquery-sli | datadog |
Query Datadog Metrics |
Fetch the results of a datadog metric timeseries and push the extracted value as an SLI metric. Docs |
datadog-system-load-sli | datadog |
Check Datadog System Load |
Retrieve a DataDog instance's "System Load" metric Docs |
discord-sendmessage-taskset | discord |
Send Chat Message |
Sends a static Discord message via webhook. Contains optional configuration for including runsession info. Docs |
dns-latency-sli | dns |
Check DNS latency for Google Resolver |
Check DNS latency for Google Resolver. Docs |
elasticsearch-health-sli | elasticsearch |
Check Elasticsearch Cluster Health |
Check Elasticsearch cluster health Docs |
gcp-gcloudcli-generic-sli | gcp |
Run Gcloud CLI Command and Push metric |
Run arbitrary gcloud commands and parse their output for arbitrary values such as json to be submitted as a metric. Docs |
gcp-gcloudcli-generic-taskset | gcp |
Run Gcloud CLI Command and Push metric |
Run arbitrary gcloud commands and capture the stdout in a report. Docs |
gcp-opssuite-logquery-dashboard-taskset | gcp |
Get GCP Log Dashboard URL For Given Log Query |
Generate a link to the GCP Log Explorer. Docs |
gcp-opssuite-logquery-sli | gcp |
Running GCE Logging Query And Pushing Result Count Metric |
Retrieve the number of results of a GCP Log Explorer query. Docs |
gcp-opssuite-metricquery-sli | gcp |
Running GCP OpsSuite Metric Query |
Performs a metric query using a Google MQL statement on the Ops Suite API and pushes the result as an SLI metric. Docs |
gcp-opssuite-promql-sli | gcp |
Run Prometheus Instant Query Against Google Prom API Endpoint |
Performs a metric query using a PromQL statement on the Ops Suite API and pushes the result as an SLI metric. Docs |
gcp-serviceshealth-sli | gcp |
Get Number of GCP Incidents Effecting My Workspace |
This codebundle sets up a monitor for a specific region and GCP Product, which is then periodically checked for ongoing incidents based on the history available at https://status.cloud.google.com/incidents.json filtered based on severity level. Docs |
github-actions-workflowtiming-sli | github |
Get Average Run Time For Workflow |
Monitors the average timing of a github actions workflow file within a repo and returns the average runtime in minutes. Docs |
github-get-repos-latency-sli | github |
Check GitHub Latency With Get Repos |
Check GitHub latency by getting a list of repo names. Docs |
github-get-repos-latency-taskset | github |
Check Latency When Creating a New GitHub Issue |
Create a new issue in GitHub Issues. Docs |
github-status-components-sli | github |
Get Availability of GitHub or Individual GitHub Components |
Check status of the GitHub platform (https://www.githubstatus.com/) for a specified set of GitHub service components. The metric supplied is a aggregated percentage indicating the availability of the components with 1 = 100% available. Docs |
github-status-incidents-sli | github |
Get Number of Incidents Affecting GitHub |
Check for unresolved incidents related to GitHub services, and provides a count of ongoing incidents as a metric. Docs |
github-status-maintenances-sli | github |
Get Scheduled and Active GitHub Maintenance Windows |
Retrieve number of upcoming Github platform maintenances over a given window. Docs |
gitlab-availability-sli | gitlab |
Check GitLab Server Status |
Check availability of a GitLab server. Docs |
gitlab-availability-taskset | gitlab |
Check GitLab Server Status |
Troubleshoot issues with GitLab server availability. Docs |
gitlab-get-repos-latency-sli | gitlab |
Check GitLab Latency With Get Repos |
Check GitLab latency by getting a list of repo names. Docs |
googlechat-sendmessage-taskset | googlechat |
Send Chat Message |
Sends a static Google Chat message via webhook. Contains optional configuration for including runsession info. Docs |
grafana-health-sli | grafana |
Check Grafana Server Health |
Check Grafana server health. Docs |
grpc-grpcurl-unary-sli | grpc |
Run gRPCurl Command and Push Metric |
A gRPC curl SLI for querying and extracting data from a generic grpcurl call. Docs |
grpc-grpcurl-unary-taskset | grpc |
Run gRPCurl Command and Show Output |
A gRPC curl taskset for querying data from a generic grpcurl call and presenting the output. Docs |
hello-world-taskset | hello |
Hello World , Add One String To Report , Add Form Values To Report |
Basic Hello-World TaskSet Docs |
http-latency-sli | http |
Check HTTP Latency to Well Known URL |
Measure HTTP latency against a given URL. The returned metric is the number of seconds the request took as a float value. Docs |
http-ok-sli | http |
Checking HTTP URL Is Available And Timely |
Check if an HTTP request against a URL fails or times out of a given latency window. A return of 1 is considered a success, while a 0 is failure. Docs |
jira-search-issues-latency-sli | jira |
Search Jira Issues By Current User |
Check Jira latency when searching issues by current user. Docs |
jira-search-issues-latency-taskset | jira |
Create a new Jira Issue |
Create an issue in Jira. Docs |
k8s-cortexmetrics-ingestor-health-sli | k8s |
Determine Cortex Ingester Ring Health |
Uses kubectl to query the state of a ingestor ring and determine if it's healthy. Returns 1 if healthy, 0 if unhealthy. Docs |
k8s-cortexmetrics-ingestor-health-taskset | k8s |
Fetch Ingestor Ring Member List and Status |
Uses kubectl to query the state of a ingestor ring. Returns the json of injester id, status and timestamp. Docs |
k8s-daemonset-healthcheck-sli | k8s |
Health Check Daemonset |
Checks that the current state of a daemonset is healthy and returns a score of either 1 (healthy) or 0 (unhealthy). Docs |
k8s-decommission-workloads-taskset | k8s |
Generate Decomission Commands |
Searches a namespace for matching objects and provides the commands to decommission them. Docs |
k8s-kubectl-apiserverhealth-sli | k8s |
Running Kubectl Check Against API Server |
Check the health of a Kubernetes API server using kubectl. Returns 1 when OK, or a 0 in the case of an unhealthy API server. Docs |
k8s-kubectl-eventquery-sli | k8s |
Get Number Of Matching Events |
Returns the number of events with matching messages as an SLI metric. Docs |
k8s-kubectl-sanitycheck-taskset | k8s |
Check Kubeconfig Secret Exists , Test Generic Shell Service Connectivity , Check Kubectl contexts , Test Command Chains , Test Kubectl Get Pods |
Used for troubleshooting the shellservice-based kubectl service Docs |
k8s-kubectl-top-sli | k8s |
Running Kubectl Top And Extracting Metric Data |
Retreieve aggregate data via kubectl top command. Docs |
k8s-patroni-healthcheck-sli | k8s |
Determine Patroni Health |
Uses kubectl (or equivalent) to query the state of a patroni cluster and determine if it's healthy. Docs |
k8s-patroni-lag-sli | k8s |
Measure Patroni Member Lag |
Measures the maximum replica lag across a Patroni cluster. Docs |
k8s-patroni-lag-taskset | k8s |
Determine Patroni Health |
Detects and reinitializes laggy Patroni cluster members which are unable to catchup in replication using kubectl and patronictl. Docs |
k8s-postgres-query-sli | k8s |
Run Postgres Query And Return Result As Metric |
Runs a postgres SQL query and pushes the returned query result as an SLI metric. During execution, the SQL query should be passed to a Kubernetes workload that has access to the psql binary. The workload will run the query and return the result from stdout. Docs |
k8s-postgres-query-taskset | k8s |
Run Postgres Query And Results to Report |
Runs a postgres SQL query and pushes the returned result into a report. During execution, the SQL query should be passed to a Kubernetes workload that has access to the psql binary. The workload will run the query and return the results from stdout. Docs |
k8s-postgres-triage-taskset | k8s |
Get Standard Resources , Describe Custom Resources , Get Pod Logs & Events , Get Pod Resource Utilization , Get Running Configuration , Get Patroni Output , Run DB Queries |
Runs multiple Kubernetes and psql commands to report on the health of a postgres cluster. Docs |
k8s-triage-deploymentreplicas-taskset | k8s |
Fetch Logs , Get Related Events , Check Deployment Replicas |
Triages issues related to a deployment's replicas. Docs |
k8s-triage-patroni-taskset | k8s |
Get Patroni Status , Get Pods Status , Fetch Logs |
Taskset to triage issues related to patroni. Docs |
k8s-triage-statefulset-taskset | k8s |
Check StatefulSets Replicas Ready , Get Events For The StatefulSet , Get StatefulSet Logs , Get StatefulSet Manifests Dump |
A taskset for troubleshooting issues for StatefulSets and their related resources. Docs |
k8s-troubleshoot-deployment-taskset | k8s |
Troubleshoot Resourcing , Troubleshoot Events , Troubleshoot PVC , Troubleshoot Pods |
A taskset for troubleshooting general issues associated with typical kubernetes deployment resources. Supports API interactions via both the API client and Kubectl binary through RunWhen Shell Services. Docs |
kong-ingress-health-gcp-promql-sli | kong |
Get Access Token , Get HTTP Error Rate , Get Upstream Health , Get Request Latency Rate , Generate Kong Ingress Score |
Uses promql on the Ops Suite API to determine the health of a Kong managed ingress resource and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource. Docs |
mongodb-health-gcp-promql-sli | mongodb |
Get Access Token , Get Instance Status , Get Connection Utilization Rate , Get MongoDB Member State Health , Get MongoDB Replication Lag , Get MongoDB Queue Size , Get Assertion Rate , Generate MongoDB Score |
Uses promql on the Ops Suite API to determine the health of a MongoDB database instance and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource. Docs |
msteams-send-message-taskset | msteams |
Send a Message to an MS Teams Channel |
Send a message to an MS Teams channel. Docs |
opsgenie-alert-taskset | opsgenie |
Get Opsgenie System Info , Create An Alert |
Create an alert in Opsgenie. Docs |
ping-host-availability-sli | ping |
Ping host and collect packet lost percentage |
Ping a host and retrieve packet loss percentage. Docs |
pingdom-health-sli | pingdom |
Check Pingdom Health |
Check health of Pingdom platform. Docs |
prometheus-queryinstant-transform-sli | prometheus |
Querying Prometheus Instance And Pushing Aggregated Data |
Run a PromQL query against Prometheus instant query API, perform a provided transform, and return the result. Docs |
prometheus-queryrange-transform-sli | prometheus |
Querying Prometheus Instance And Pushing Aggregated Data |
Run a PromQL query against Prometheus range query API, perform a provided transform, and return the result. Docs |
remote-http-ok-sli | remote |
Checking HTTP URL Is Available And Timely |
Check that a HTTP endpoint is healthy and returns in a target latency. Docs |
rest-basicauth-sli | rest |
Request Data From Rest Endpoint |
A general purpose REST SLI for querying and extracting data from a REST endpoint that uses a basic auth flow. Docs |
rest-explicitoauth2-basicauth-sli | rest |
Request Data From Rest Endpoint |
A REST SLI for querying and extracting data from a REST endpoint that needs an explicit oauth2 flow. Where the token acquisition is handled using basic auth. Docs |
rest-explicitoauth2-tokenheader-sli | rest |
Request Data From Rest Endpoint |
A REST SLI for querying and extracting data from a REST endpoint that needs an explicit oauth2 flow. Where an access token must be acquired with a bearer token. Docs |
rest-generic-sli | rest |
Request Data From Rest Endpoint |
A general purpose REST SLI for querying and extracting data from a REST endpoint that uses a implicit oauth2 flow. Docs |
rocketchat-sendmessage-taskset | rocketchat |
Send Chat Message |
Sends a static Rocketchat message via webhook. Contains optional configuration for including runsession info. Docs |
slack-sendmessage-taskset | slack |
Send Chat Message |
Sends a static Slack message via webhook. Contains optional configuration for including runsession info. Docs |
sli-alert-threshold-sli | sli |
Check If SLI Within Incident Threshold |
An SLI which monitors another SLI that's submitting a 0-1 health score and when that health score falls below a threshold, will immediately trigger a taskset. When this SLI detects a rate below the threshold rate it submits a 1 to denote a signal was sent before returning to 0 when the monitored SLI is healthy. Docs |
sysdig-monitor-metric-sli | sysdig |
Query Sysdig Metric Data And Pushing Metric |
Queries the Sysdig data API to fetch metric data. Docs |
sysdig-monitor-promqlmetric-sli | sysdig |
Querying PromQL Endpoint And Pushing Metric Data |
Queries the Sysdig data API with a PromQL query to fetch metric data. Docs |
twitter-query-tweets-sli | twitter |
Query Twitter |
Queries Twitter to count amount of tweets within a specified time range for a specific user handle. Docs |
twitter-query-tweets-taskset | twitter |
Query Twitter |
Queries Twitter to fetch tweets within a specified time range for a specific user handle add them to a report. Docs |
uptimecom-component-ok-sli | uptimecom |
Check If Vault Endpoint Is Healthy |
Check the status of an Uptime.com component for a given site. It compares the operational state of the component with the list of allowed states, resulting in a 1 when acceptable, and 0 when not. Docs |
vault-ok-sli | vault |
Check If Vault Endpoint Is Healthy |
Check the health of a Vault server. The response code is used to determine if the service is healthy, resulting in a metric of 1 if it is, or 0 if not. Docs |
web-triage-taskset | web |
Validate Platform Egress , Perform Inspection On URL |
Troubleshoot and triage a URL to inspect it for common issues such as an expired certification, missing DNS records, etc. Docs |