redhat-cop / must-gather-operator

An operator to simplify the creation and upload of cluster diagnostics from the must-gather tool
Apache License 2.0
10 stars 15 forks source link
container-cop k8s-operator

Must Gather Operator

build status Go Report Card GitHub go.mod Go version

The must gather operator helps collecting must gather information on a cluster and uploading it to a case. To use the operator a cluster administrator can create the following must gather CR:

apiVersion: redhatcop.redhat.io/v1alpha1
kind: MustGather
metadata:
  name: example-mustgather
spec:
  caseID: '02527285'
  caseManagementAccountSecretRef:
    name: case-management-creds

this request will collect the standard must gather info and upload it to case #02527285 using the credentials found in the caseManagementCreds secret.

A more complex example:

apiVersion: redhatcop.redhat.io/v1alpha1
kind: MustGather
metadata:
  name: full-mustgather
spec:
  caseID: '02527285'
  caseManagementAccountSecretRef:
    name: case-management-creds
  serviceAccountRef:
    name: must-gather-admin
  mustGatherImages:
  - quay.io/kubevirt/must-gather:latest
  - quay.io/ocs-dev/ocs-must-gather

in this example we are using a specific service account (which must have cluster admin permissions as per must-gather requirements) and we are specifying a couple of additional must gather images to be run for the kubevirt and ocs subsystem. If not specified serviceAccountRef.Name will default to default. Also the standard must gather image: quay.io/openshift/origin-must-gather:latest is always added by default.

Proxy Support

The MustGather operator supports using a proxy. The proxy setting can be specified in the MustGather object. If not specified, the cluster default proxy setting will be used. Here is an example:

apiVersion: redhatcop.redhat.io/v1alpha1
kind: MustGather
metadata:
  name: example-mustgather
spec:
  caseID: '02527285'
  caseManagementAccountSecretRef:
    name: case-management-creds
  proxyConfig:
    httpProxy: http://myproxy
    httpsProxy: https://my_http_proxy
    noProxy: master-api  

Garbage collection

MustGather instances are cleaned up by the Must Gather operator about 6 hours after completion, regardless of whether they were successful. This is a way to prevent the accumulation of unwanted MustGather resources and their corresponding job resources.

Deploying the Operator

This is a cluster-level operator that you can deploy in any namespace, must-gather-operator is recommended.

It is recommended to deploy this operator via OperatorHub, but you can also deploy it using Helm.

Multiarch Support

Arch Support
amd64
arm64
ppc64le
s390x

Deploying from OperatorHub

Note: This operator supports being installed disconnected environments

If you want to utilize the Operator Lifecycle Manager (OLM) to install this operator, you can do so in two ways: from the UI or the CLI.

Deploying from OperatorHub UI

oc new-project must-gather-operator

Must Gather Operator

Deploying from OperatorHub using CLI

If you'd like to launch this operator from the command line, you can use the manifests contained in this repository by running the following:

oc new-project must-gather-operator
oc apply -f config/operatorhub -n must-gather-operator

This will create the appropriate OperatorGroup and Subscription and will trigger OLM to launch the operator in the specified namespace.

Deploying with Helm

Here are the instructions to install the latest release with Helm.

oc new-project must-gather-operator
helm repo add must-gather-operator https://redhat-cop.github.io/must-gather-operator
helm repo update
helm install must-gather-operator must-gather-operator/must-gather-operator

This can later be updated with the following commands:

helm repo update
helm upgrade must-gather-operator must-gather-operator/must-gather-operator

Metrics

Prometheus compatible metrics are exposed by the Operator and can be integrated into OpenShift's default cluster monitoring. To enable OpenShift cluster monitoring, label the namespace the operator is deployed in with the label openshift.io/cluster-monitoring="true".

oc label namespace <namespace> openshift.io/cluster-monitoring="true"

Testing metrics

export operatorNamespace=must-gather-operator-local # or must-gather-operator
oc label namespace ${operatorNamespace} openshift.io/cluster-monitoring="true"
oc rsh -n openshift-monitoring -c prometheus prometheus-k8s-0 /bin/bash
export operatorNamespace=must-gather-operator-local # or must-gather-operator
curl -v -s -k -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://must-gather-operator-controller-manager-metrics.${operatorNamespace}.svc.cluster.local:8443/metrics
exit

Development

Running the operator locally

make install
export repo=raffaelespazzoli #replace with yours
docker login quay.io/$repo/must-gather-operator
make docker-build IMG=quay.io/$repo/must-gather-operator:latest
make docker-push IMG=quay.io/$repo/must-gather-operator:latest
oc new-project must-gather-operator-local
kustomize build ./config/local-development | oc apply -f - -n must-gather-operator-local
export DEFAULT_MUST_GATHER_IMAGE='quay.io/openshift/origin-must-gather:4.6'
export JOB_TEMPLATE_FILE_NAME=./config/templates/job.template.yaml
export token=$(oc serviceaccounts get-token 'must-gather-controller-manager' -n must-gather-operator-local)
oc login --token ${token}
make run ENABLE_WEBHOOKS=false

Test helm chart locally

Define an image and tag. For example...

export imageRepository="quay.io/redhat-cop/must-gather-operator"
export imageTag="$(git -c 'versionsort.suffix=-' ls-remote --exit-code --refs --sort='version:refname' --tags https://github.com/redhat-cop/must-gather-operator.git '*.*.*' | tail --lines=1 | cut --delimiter='/' --fields=3)"

Deploy chart...

make helmchart IMG=${imageRepository} VERSION=${imageTag}
helm upgrade -i must-gather-operator-local charts/must-gather-operator -n must-gather-operator-local --create-namespace

Delete...

helm delete must-gather-operator-local -n must-gather-operator-local
kubectl delete -f charts/must-gather-operator/crds/crds.yaml

Building/Pushing the operator image

export repo=raffaelespazzoli #replace with yours
docker login quay.io/$repo/must-gather-operator
make docker-build IMG=quay.io/$repo/must-gather-operator:latest
make docker-push IMG=quay.io/$repo/must-gather-operator:latest

Deploy to OLM via bundle

make manifests
make bundle IMG=quay.io/$repo/must-gather-operator:latest
operator-sdk bundle validate ./bundle --select-optional name=operatorhub
make bundle-build BUNDLE_IMG=quay.io/$repo/must-gather-operator-bundle:latest
docker login quay.io/$repo/must-gather-operator-bundle
docker push quay.io/$repo/must-gather-operator-bundle:latest
operator-sdk bundle validate quay.io/$repo/must-gather-operator-bundle:latest --select-optional name=operatorhub
oc new-project must-gather-operator
oc label namespace must-gather-operator openshift.io/cluster-monitoring="true"
operator-sdk cleanup must-gather-operator -n must-gather-operator
operator-sdk run bundle --install-mode AllNamespaces -n must-gather-operator quay.io/$repo/must-gather-operator-bundle:latest

Releasing

git tag -a "<tagname>" -m "<commit message>"
git push upstream <tagname>

If you need to remove a release:

git tag -d <tagname>
git push upstream --delete <tagname>

If you need to "move" a release to the current main

git tag -f <tagname>
git push upstream -f <tagname>

Cleaning up

operator-sdk cleanup must-gather-operator -n must-gather-operator
oc delete operatorgroup operator-sdk-og
oc delete catalogsource must-gather-operator-catalog