RunWhen Public Codecollection

This repository is the primary public codecollection that is to be used within the RunWhen platform. It contains codebundles that can be used in SLIs, SLOs, and TaskSets.

Please see the contributing and code of conduct for details on adding your contributions to this project.

Documentation for each codebundle is maintained in the README.md alongside the robot code and is published at https://docs.runwhen.com/public/v/codebundles/. Please see the readme howto for details on crafting a codebundle readme that can be indexed.

Codebundle Index

Name	Supported Integrations	Tasks	Documentation
Kubernetes Namespace Healthcheck	`Kubernetes`, `AKS`, `EKS`, `GKE`, `OpenShift`	`Get Event Count and Score`, `Get Container Restarts and Score`, `Get NotReady Pods`, `Generate Namspace Score`	This SLI uses kubectl to score namespace health. Produces a value between 0 (completely failing thet test) and 1 (fully passing the test). Looks for container restarts, events, and pods not ready. Docs
Kubernetes Namespace Troubleshoot	`Kubernetes`, `AKS`, `EKS`, `GKE`, `OpenShift`	`Trace Namespace Errors`, `Fetch Unready Pods`, `Triage Namespace`, `Object Condition Check`, `Namespace Get All`	This taskset runs general troubleshooting checks against all applicable objects in a namespace, checks error events, and searches pod logs for error entries. Docs
Kubernetes Run Shell Command	`Kubernetes`, `AKS`, `EKS`, `GKE`, `OpenShift`	`Running Kubectl And Adding Stdout To Report`	This codebundle runs an arbitrary kubectl command and writes the stdout to a report. Typically used in conjunction with other codebundles. Docs
Kubernetes Synthetic PVC Test	`Kubernetes`, `AKS`, `EKS`, `GKE`, `OpenShift`	`Run Canary Job`	Creates an adhoc one-shot job which mounts a PVC as a canary test, which is polled for success before being torn down. Docs
Kubernetes Workload Metric	`Kubernetes`, `AKS`, `EKS`, `GKE`, `OpenShift`	`Running Kubectl get and push the metric`	This codebundle runs a kubectl get command that produces a value and pushes the metric. Uses jmespath for filtering and allows calculations such as count, sum, avg on specified fields. Docs
argocd-healthcheck-sli	`argocd`	`ArgoCD Health Check`	Check the health of ArgoCD platfrom by checking the availability of its underlying Deployments and StatefulSets. Docs
artifactory-ok-sli	`artifactory`	`Check If Artifactory Endpoint Is Healthy`	Checks an Artifactory instance health endpoint to determine its operational status. The response is parsed to determine if the service is healthy, resulting in a metric of 1 if it is, or 0 if not. Docs
aws-account-limit-sli	`aws`	`Get Count Of AWS Accounts In Organization`	Retrieve the count of all AWS accounts in an organization. Docs
aws-account-limit-taskset	`aws`, `iam`	`Get The Recently Created AWS Accounts`	Retrieve all recently created AWS accounts. Docs
aws-billing-costsacrosstags-taskset	`aws`, `billing`, `costexplorer`	`Get All Billing Sliced By Tags`	Creates a report of AWS line item costs filtered to a list of tagged resources Docs
aws-billing-tagcosts-sli	`aws`, `billing`, `costexplorer`	`Get All Billing Sliced By Tags`	Monitors AWS cost and usage data for the latest billing period. Accepts one tag for continuous monitoring. Docs
aws-cloudformation-stackevents-count-sli	`aws`, `cloudformation`	`Fetch CloudFormation Stack Events`	Retrieve the number of detected AWS CloudFormation stack events over a given history Docs
aws-cloudformation-triage-taskset	`aws`, `cloudformation`	`Get All Recent Stack Events`	Triage and troubleshoot various issues with AWS CloudFormation Docs
aws-cloudwatch-logquery-rowcount-zeroerror-sli	`aws`, `cloudwatch`	`Running CloudWatch Log Query And Pushing 1 If No Results Found`	Retrieve binary result from an AWS CloudWatch Insights query. Pushes 0 (success) if logs are found (activity) or 1 if no logs were found in the time window. Docs
aws-cloudwatch-logquery-sli	`aws`, `cloudwatch`	`Running CloudWatch Log Query And Pushing The Count Of Results`	Retrieve number of results from an AWS CloudWatch Insights query. Docs
aws-cloudwatch-metricquery-dashboard-taskset	`aws`, `cloudwatch`	`Get CloudWatch MetricQuery Insights URL`	Creates a URL to a AWS CloudWatch metrics dashboard with a running query. Docs
aws-cloudwatch-metricquery-sli	`aws`, `cloudwatch`	`Running CloudWatch Metric Query And Pushing The Result`	Retrieve the result of an AWS CloudWatch Metrics Insights query. Docs
aws-cloudwatch-tagmetricquery-sli	`aws`, `cloudwatch`	`Run CloudWatch Metric Query Across Set Of IDs And Push Metric`	Retrieve aggregate results from multiple AWS Cloudwatch Metrics Insights queries ran against tagged resources. This codebundle fetches a list of instance IDs filtered by tags, and uses them to run a set of AWS metric queries against the CloudWatch metrics insights API and pushes an aggregated/transformed value provided by the API as a metric. Docs
aws-ec2-securitycheck-taskset	`aws`, `ec2`, `cloudwatch`	`Check For Untagged instances`, `Check For Dangling Volumes`, `Check For Open Routes`, `Check For Overused Instances`, `Check For Underused Instances`, `Check For Underused Volumes`, `Check For Overused Volumes`	Performs a suite of security checks against a set of AWS EC2 instances. Checks include untagged instances, dangling volumes, open routes. Docs
aws-s3-stalecheck-taskset	`aws`, `s3`, `bucket`	`Create Report For Stale Buckets`	Identify stale AWS S3 buckets, based on last modified object timestamp. Docs
aws-vm-triage-taskset	`aws`, `ec2`, `cloudwatch`	`Get Max VM CPU Utilization In Last 3 Hours`, `Get Lowest VM CPU Credits In Last 3 Hours`, `Get Max VM CPU Credit Usage In Last 3 hours`, `Get Max VM Memory Utilization In Last 3 Hours`, `Get Max VM Volume Usage In Last 3 Hours`	Triage and troubleshoot performance and usage of an AWS EC2 instance Docs
cert-manager-expirations-sli	`cert`	`Inspect Certification Expiration Dates`	Retrieve number of expired TLS certificates managed by cert-manager within a given window. The metric pushed is the number of certs within the configured expiration window. Docs
cert-manager-healthcheck-sli	`cert`	`Health Check cert-manager Pods`	Check the health of pods deployed by cert-manager. Docs
curl-generic-sli	`curl`	`Run Curl Command and Push Metric`	A curl SLI for querying and extracting data from a generic curl call. Supports jq. Should prodice a single metric. Docs
curl-generic-taskset	`curl`	`Run Curl Command and Add to Report`	A curl TaskSet for querying and extracting data from a generic curl call. Supports jq. Adds results to the report. Docs
datadog-metricquery-sli	`datadog`	`Query Datadog Metrics`	Fetch the results of a datadog metric timeseries and push the extracted value as an SLI metric. Docs
datadog-system-load-sli	`datadog`	`Check Datadog System Load`	Retrieve a DataDog instance's "System Load" metric Docs
discord-sendmessage-taskset	`discord`	`Send Chat Message`	Sends a static Discord message via webhook. Contains optional configuration for including runsession info. Docs
dns-latency-sli	`dns`	`Check DNS latency for Google Resolver`	Check DNS latency for Google Resolver. Docs
elasticsearch-health-sli	`elasticsearch`	`Check Elasticsearch Cluster Health`	Check Elasticsearch cluster health Docs
gcp-gcloudcli-generic-sli	`gcp`	`Run Gcloud CLI Command and Push metric`	Run arbitrary gcloud commands and parse their output for arbitrary values such as json to be submitted as a metric. Docs
gcp-gcloudcli-generic-taskset	`gcp`	`Run Gcloud CLI Command and Push metric`	Run arbitrary gcloud commands and capture the stdout in a report. Docs
gcp-opssuite-logquery-dashboard-taskset	`gcp`	`Get GCP Log Dashboard URL For Given Log Query`	Generate a link to the GCP Log Explorer. Docs
gcp-opssuite-logquery-sli	`gcp`	`Running GCE Logging Query And Pushing Result Count Metric`	Retrieve the number of results of a GCP Log Explorer query. Docs
gcp-opssuite-metricquery-sli	`gcp`	`Running GCP OpsSuite Metric Query`	Performs a metric query using a Google MQL statement on the Ops Suite API and pushes the result as an SLI metric. Docs
gcp-opssuite-promql-sli	`gcp`	`Run Prometheus Instant Query Against Google Prom API Endpoint`	Performs a metric query using a PromQL statement on the Ops Suite API and pushes the result as an SLI metric. Docs
gcp-serviceshealth-sli	`gcp`	`Get Number of GCP Incidents Effecting My Workspace`	This codebundle sets up a monitor for a specific region and GCP Product, which is then periodically checked for ongoing incidents based on the history available at https://status.cloud.google.com/incidents.json filtered based on severity level. Docs
github-actions-workflowtiming-sli	`github`	`Get Average Run Time For Workflow`	Monitors the average timing of a github actions workflow file within a repo and returns the average runtime in minutes. Docs
github-get-repos-latency-sli	`github`	`Check GitHub Latency With Get Repos`	Check GitHub latency by getting a list of repo names. Docs
github-get-repos-latency-taskset	`github`	`Check Latency When Creating a New GitHub Issue`	Create a new issue in GitHub Issues. Docs
github-status-components-sli	`github`	`Get Availability of GitHub or Individual GitHub Components`	Check status of the GitHub platform (https://www.githubstatus.com/) for a specified set of GitHub service components. The metric supplied is a aggregated percentage indicating the availability of the components with 1 = 100% available. Docs
github-status-incidents-sli	`github`	`Get Number of Incidents Affecting GitHub`	Check for unresolved incidents related to GitHub services, and provides a count of ongoing incidents as a metric. Docs
github-status-maintenances-sli	`github`	`Get Scheduled and Active GitHub Maintenance Windows`	Retrieve number of upcoming Github platform maintenances over a given window. Docs
gitlab-availability-sli	`gitlab`	`Check GitLab Server Status`	Check availability of a GitLab server. Docs
gitlab-availability-taskset	`gitlab`	`Check GitLab Server Status`	Troubleshoot issues with GitLab server availability. Docs
gitlab-get-repos-latency-sli	`gitlab`	`Check GitLab Latency With Get Repos`	Check GitLab latency by getting a list of repo names. Docs
googlechat-sendmessage-taskset	`googlechat`	`Send Chat Message`	Sends a static Google Chat message via webhook. Contains optional configuration for including runsession info. Docs
grafana-health-sli	`grafana`	`Check Grafana Server Health`	Check Grafana server health. Docs
grpc-grpcurl-unary-sli	`grpc`	`Run gRPCurl Command and Push Metric`	A gRPC curl SLI for querying and extracting data from a generic grpcurl call. Docs
grpc-grpcurl-unary-taskset	`grpc`	`Run gRPCurl Command and Show Output`	A gRPC curl taskset for querying data from a generic grpcurl call and presenting the output. Docs
hello-world-taskset	`hello`	`Hello World`, `Add One String To Report`, `Add Form Values To Report`	Basic Hello-World TaskSet Docs
http-latency-sli	`http`	`Check HTTP Latency to Well Known URL`	Measure HTTP latency against a given URL. The returned metric is the number of seconds the request took as a float value. Docs
http-ok-sli	`http`	`Checking HTTP URL Is Available And Timely`	Check if an HTTP request against a URL fails or times out of a given latency window. A return of 1 is considered a success, while a 0 is failure. Docs
jira-search-issues-latency-sli	`jira`	`Search Jira Issues By Current User`	Check Jira latency when searching issues by current user. Docs
jira-search-issues-latency-taskset	`jira`	`Create a new Jira Issue`	Create an issue in Jira. Docs
k8s-cortexmetrics-ingestor-health-sli	`k8s`	`Determine Cortex Ingester Ring Health`	Uses kubectl to query the state of a ingestor ring and determine if it's healthy. Returns 1 if healthy, 0 if unhealthy. Docs
k8s-cortexmetrics-ingestor-health-taskset	`k8s`	`Fetch Ingestor Ring Member List and Status`	Uses kubectl to query the state of a ingestor ring. Returns the json of injester id, status and timestamp. Docs
k8s-daemonset-healthcheck-sli	`k8s`	`Health Check Daemonset`	Checks that the current state of a daemonset is healthy and returns a score of either 1 (healthy) or 0 (unhealthy). Docs
k8s-decommission-workloads-taskset	`k8s`	`Generate Decomission Commands`	Searches a namespace for matching objects and provides the commands to decommission them. Docs
k8s-kubectl-apiserverhealth-sli	`k8s`	`Running Kubectl Check Against API Server`	Check the health of a Kubernetes API server using kubectl. Returns 1 when OK, or a 0 in the case of an unhealthy API server. Docs
k8s-kubectl-eventquery-sli	`k8s`	`Get Number Of Matching Events`	Returns the number of events with matching messages as an SLI metric. Docs
k8s-kubectl-sanitycheck-taskset	`k8s`	`Check Kubeconfig Secret Exists`, `Test Generic Shell Service Connectivity`, `Check Kubectl contexts`, `Test Command Chains`, `Test Kubectl Get Pods`	Used for troubleshooting the shellservice-based kubectl service Docs
k8s-kubectl-top-sli	`k8s`	`Running Kubectl Top And Extracting Metric Data`	Retreieve aggregate data via kubectl top command. Docs
k8s-patroni-healthcheck-sli	`k8s`	`Determine Patroni Health`	Uses kubectl (or equivalent) to query the state of a patroni cluster and determine if it's healthy. Docs
k8s-patroni-lag-sli	`k8s`	`Measure Patroni Member Lag`	Measures the maximum replica lag across a Patroni cluster. Docs
k8s-patroni-lag-taskset	`k8s`	`Determine Patroni Health`	Detects and reinitializes laggy Patroni cluster members which are unable to catchup in replication using kubectl and patronictl. Docs
k8s-postgres-query-sli	`k8s`	`Run Postgres Query And Return Result As Metric`	Runs a postgres SQL query and pushes the returned query result as an SLI metric. During execution, the SQL query should be passed to a Kubernetes workload that has access to the psql binary. The workload will run the query and return the result from stdout. Docs
k8s-postgres-query-taskset	`k8s`	`Run Postgres Query And Results to Report`	Runs a postgres SQL query and pushes the returned result into a report. During execution, the SQL query should be passed to a Kubernetes workload that has access to the psql binary. The workload will run the query and return the results from stdout. Docs
k8s-postgres-triage-taskset	`k8s`	`Get Standard Resources`, `Describe Custom Resources`, `Get Pod Logs & Events`, `Get Pod Resource Utilization`, `Get Running Configuration`, `Get Patroni Output`, `Run DB Queries`	Runs multiple Kubernetes and psql commands to report on the health of a postgres cluster. Docs
k8s-triage-deploymentreplicas-taskset	`k8s`	`Fetch Logs`, `Get Related Events`, `Check Deployment Replicas`	Triages issues related to a deployment's replicas. Docs
k8s-triage-patroni-taskset	`k8s`	`Get Patroni Status`, `Get Pods Status`, `Fetch Logs`	Taskset to triage issues related to patroni. Docs
k8s-triage-statefulset-taskset	`k8s`	`Check StatefulSets Replicas Ready`, `Get Events For The StatefulSet`, `Get StatefulSet Logs`, `Get StatefulSet Manifests Dump`	A taskset for troubleshooting issues for StatefulSets and their related resources. Docs
k8s-troubleshoot-deployment-taskset	`k8s`	`Troubleshoot Resourcing`, `Troubleshoot Events`, `Troubleshoot PVC`, `Troubleshoot Pods`	A taskset for troubleshooting general issues associated with typical kubernetes deployment resources. Supports API interactions via both the API client and Kubectl binary through RunWhen Shell Services. Docs
kong-ingress-health-gcp-promql-sli	`kong`	`Get Access Token`, `Get HTTP Error Rate`, `Get Upstream Health`, `Get Request Latency Rate`, `Generate Kong Ingress Score`	Uses promql on the Ops Suite API to determine the health of a Kong managed ingress resource and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource. Docs
mongodb-health-gcp-promql-sli	`mongodb`	`Get Access Token`, `Get Instance Status`, `Get Connection Utilization Rate`, `Get MongoDB Member State Health`, `Get MongoDB Replication Lag`, `Get MongoDB Queue Size`, `Get Assertion Rate`, `Generate MongoDB Score`	Uses promql on the Ops Suite API to determine the health of a MongoDB database instance and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource. Docs
msteams-send-message-taskset	`msteams`	`Send a Message to an MS Teams Channel`	Send a message to an MS Teams channel. Docs
opsgenie-alert-taskset	`opsgenie`	`Get Opsgenie System Info`, `Create An Alert`	Create an alert in Opsgenie. Docs
ping-host-availability-sli	`ping`	`Ping host and collect packet lost percentage`	Ping a host and retrieve packet loss percentage. Docs
pingdom-health-sli	`pingdom`	`Check Pingdom Health`	Check health of Pingdom platform. Docs
prometheus-queryinstant-transform-sli	`prometheus`	`Querying Prometheus Instance And Pushing Aggregated Data`	Run a PromQL query against Prometheus instant query API, perform a provided transform, and return the result. Docs
prometheus-queryrange-transform-sli	`prometheus`	`Querying Prometheus Instance And Pushing Aggregated Data`	Run a PromQL query against Prometheus range query API, perform a provided transform, and return the result. Docs
remote-http-ok-sli	`remote`	`Checking HTTP URL Is Available And Timely`	Check that a HTTP endpoint is healthy and returns in a target latency. Docs
rest-basicauth-sli	`rest`	`Request Data From Rest Endpoint`	A general purpose REST SLI for querying and extracting data from a REST endpoint that uses a basic auth flow. Docs
rest-explicitoauth2-basicauth-sli	`rest`	`Request Data From Rest Endpoint`	A REST SLI for querying and extracting data from a REST endpoint that needs an explicit oauth2 flow. Where the token acquisition is handled using basic auth. Docs
rest-explicitoauth2-tokenheader-sli	`rest`	`Request Data From Rest Endpoint`	A REST SLI for querying and extracting data from a REST endpoint that needs an explicit oauth2 flow. Where an access token must be acquired with a bearer token. Docs
rest-generic-sli	`rest`	`Request Data From Rest Endpoint`	A general purpose REST SLI for querying and extracting data from a REST endpoint that uses a implicit oauth2 flow. Docs
rocketchat-sendmessage-taskset	`rocketchat`	`Send Chat Message`	Sends a static Rocketchat message via webhook. Contains optional configuration for including runsession info. Docs
slack-sendmessage-taskset	`slack`	`Send Chat Message`	Sends a static Slack message via webhook. Contains optional configuration for including runsession info. Docs
sli-alert-threshold-sli	`sli`	`Check If SLI Within Incident Threshold`	An SLI which monitors another SLI that's submitting a 0-1 health score and when that health score falls below a threshold, will immediately trigger a taskset. When this SLI detects a rate below the threshold rate it submits a 1 to denote a signal was sent before returning to 0 when the monitored SLI is healthy. Docs
sysdig-monitor-metric-sli	`sysdig`	`Query Sysdig Metric Data And Pushing Metric`	Queries the Sysdig data API to fetch metric data. Docs
sysdig-monitor-promqlmetric-sli	`sysdig`	`Querying PromQL Endpoint And Pushing Metric Data`	Queries the Sysdig data API with a PromQL query to fetch metric data. Docs
twitter-query-tweets-sli	`twitter`	`Query Twitter`	Queries Twitter to count amount of tweets within a specified time range for a specific user handle. Docs
twitter-query-tweets-taskset	`twitter`	`Query Twitter`	Queries Twitter to fetch tweets within a specified time range for a specific user handle add them to a report. Docs
uptimecom-component-ok-sli	`uptimecom`	`Check If Vault Endpoint Is Healthy`	Check the status of an Uptime.com component for a given site. It compares the operational state of the component with the list of allowed states, resulting in a 1 when acceptable, and 0 when not. Docs
vault-ok-sli	`vault`	`Check If Vault Endpoint Is Healthy`	Check the health of a Vault server. The response code is used to determine if the service is healthy, resulting in a metric of 1 if it is, or 0 if not. Docs
web-triage-taskset	`web`	`Validate Platform Egress`, `Perform Inspection On URL`	Troubleshoot and triage a URL to inspect it for common issues such as an expired certification, missing DNS records, etc. Docs

runwhen-contrib / rw-public-codecollection

readme

RunWhen Public Codecollection

Codebundle Index