Closed juan-vg closed 1 year ago
Thank you, this is an important question. Let me answer it
k8sgpt analyze
This connects to the Kubernetes API server and only looks at Conditions and Status messages on objects. There is no AI used in this step at all
k8sgpt analyze --explain
This still will forward an aggregate parcel of information which may have personally identifiable information in it e.g. error back off container/alexsjones:latest is failing!
for example.
If you have any doubts or worries I would not recommend using --explain
if you do not trust OpenAI.
Although it is all over TLS I cannot speak to their data retention policy other than this document I found here
As for the k8sgpt
we store nothing aside from the ~/.k8sgpt.yaml
which would only cache the results of --explain
on your local machine if you choose to use it.
I hope this helps.
I see. Thank you for clarifying it!
Regarding the OpenAI data retention policy, it comes to my mind to apply some kind of data anonymization before sending it to them. It could be very cool to implement it or to allow a plugin to do so. From my point of view it's not about trusting OpenAI or not, but about preventing them from using that sensitive data for their future trainings.
I think this could be a game changer, I know that Google has "Data loss prevention API" in GCP. I wonder if we could start somewhere simply with Regex and build it out into a module?
I've been playing with ChatGPT, providing some outputs from k8sgpt and asking for a Go script to automatically detect sensitive data and transform it into random chars while respecting the same word/sentence structure. I based the detection on word entropy but I realize it's not the way to go (works better for detecting random char strings, usually used in passwords and tokens). Is it possible for you to provide examples of each message the k8sgpt could output so we can then feed ChatGPT with better training data for the detection?
Just to round out this issue, we will add a task to clarify this in the documentation.
As for examples, here are some but none of my workloads directly expose PII in their error strings:
0 argocd/argocd-application-controller-0(StatefulSet/argocd-application-controller)
- Error: back-off 5m0s restarting failed container=argocd-application-controller-instrumentation pod=argocd-application-controller-0_argocd(1748b1c2-28b2-40d9-b58e-9906220fbb0d)
1 default/deathstar-5b559d699b-smvrn(Deployment/deathstar)
- Error: Back-off pulling image "docker.io/cilium/starwaraaes"
2 foo/alpha-6c85f869-d7tbq(Deployment/alpha)
- Error: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: container init was OOM-killed (memory limit too low?): unknown
3 observability/loki-read-0(StatefulSet/loki-read)
- Error: 0/5 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had volume node affinity conflict, 3 node(s) were unschedulable. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..
4 observability/loki-read-1(StatefulSet/loki-read)
- Error: 0/5 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had volume node affinity conflict, 3 node(s) were unschedulable. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..
5 observability/loki-write-0(StatefulSet/loki-write)
- Error: 0/5 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had volume node affinity conflict, 3 node(s) were unschedulable. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..
6 observability/loki-write-1(StatefulSet/loki-write)
- Error: 0/5 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had volume node affinity conflict, 3 node(s) were unschedulable. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..
7 observability/prometheus-observability-kube-prometh-prometheus-0(prometheus-observability-kube-prometh-prometheus)
- Error: 0/5 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 1 node(s) had volume node affinity conflict, 3 node(s) were unschedulable. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..
8 argocd/argocd-applicationset-controller-5c5496c549-7ppk2(Deployment/argocd-applicationset-controller)
- Error: back-off 5m0s restarting failed container=argocd-applicationset-controller-instrumentation pod=argocd-applicationset-controller-5c5496c549-7ppk2_argocd(993f24eb-f345-4199-a0c1-8cced885716d)
9 argocd/argocd-dex-server-5fcdf867b7-mv496(Deployment/argocd-dex-server)
- Error: back-off 5m0s restarting failed container=dex-instrumentation pod=argocd-dex-server-5fcdf867b7-mv496_argocd(32e8f49e-88f0-4e54-8408-ea3162292f7d)
10 argocd/argocd-repo-server-5785865bd8-pgtc7(Deployment/argocd-repo-server)
- Error: back-off 5m0s restarting failed container=argocd-repo-server-instrumentation pod=argocd-repo-server-5785865bd8-pgtc7_argocd(66078850-2252-4230-97aa-c22fe98dff8d)
11 argocd/argocd-server-59677d6f74-n2p5m(Deployment/argocd-server)
- Error: back-off 5m0s restarting failed container=argocd-server-instrumentation pod=argocd-server-59677d6f74-n2p5m_argocd(060722da-5f30-41f9-9895-8c375abe2034)
12 default/deathstar(deathstar)
- Error: Service has not ready endpoints, pods: [Pod/deathstar-5b559d699b-smvrn], expected 1
13 argocd/argocd-applicationset-controller(argocd-applicationset-controller)
- Error: Service has not ready endpoints, pods: [Pod/argocd-applicationset-controller-5c5496c549-7ppk2], expected 1
14 argocd/argocd-server(argocd-server)
- Error: Service has not ready endpoints, pods: [Pod/argocd-server-59677d6f74-n2p5m], expected 1
15 cats/cats(cats)
- Error: Service has no endpoints, expected label app.kubernetes.io/instance=cats
- Error: Service has no endpoints, expected label app.kubernetes.io/name=cats
16 argocd/argocd-server-metrics(argocd-server-metrics)
- Error: Service has not ready endpoints, pods: [Pod/argocd-server-59677d6f74-n2p5m], expected 1
17 khole-system/khole-controller-manager-metrics-service(khole-controller-manager-metrics-service)
- Error: Service has no endpoints, expected label control-plane=controller-manager
18 metallb-system/metallb-webhook-service(metallb-webhook-service)
- Error: Service has no endpoints, expected label app.kubernetes.io/component=controller
- Error: Service has no endpoints, expected label app.kubernetes.io/instance=metallb
- Error: Service has no endpoints, expected label app.kubernetes.io/name=metallb
19 observability/observability-kube-prometh-prometheus(observability-kube-prometh-prometheus)
- Error: Service has no endpoints, expected label app.kubernetes.io/name=prometheus
- Error: Service has no endpoints, expected label prometheus=observability-kube-prometh-prometheus
20 observability/prometheus-operated(prometheus-operated)
- Error: Service has no endpoints, expected label app.kubernetes.io/name=prometheus
21 argocd/argocd-dex-server(argocd-dex-server)
- Error: Service has not ready endpoints, pods: [Pod/argocd-dex-server-5fcdf867b7-mv496], expected 1
22 argocd/argocd-metrics(argocd-metrics)
- Error: Service has not ready endpoints, pods: [Pod/argocd-application-controller-0], expected 1
23 argocd/argocd-repo-server(argocd-repo-server)
- Error: Service has not ready endpoints, pods: [Pod/argocd-repo-server-5785865bd8-pgtc7], expected 1
Checklist:
Subject of the issue
I would like to know how my private cluster-data is treated. I would not like to disclose any personal info to GTP/OpenAI. I miss some information about this topic in the README.
Is the collected data anonymized before sending it to GPT? Which data is collected? Are my private and sensible clusters going to keep safe? Why? Is this tools GDPR compliant? Etc...