k8sgpt-ai / k8sgpt

Giving Kubernetes Superpowers to everyone
http://k8sgpt.ai
Apache License 2.0
5.15k stars 585 forks source link

[bug]: run k8sgpt analyze will only complete successfully once , after trivy integration is active #1063

Open liyuerich opened 2 months ago

liyuerich commented 2 months ago

Checklist

Affected Components

K8sGPT Version

0.3.29 (5db4bc2)

Kubernetes Version

v1.27.5

Host OS and its Version

Ubuntu Linux controller-node-1 5.4.0-174-generic

Steps to reproduce

  1. first I active trivy and run k8sgpt analyze successfully,
  2. then I run k8sgpt analyze again, I got error message.
  3. after deactive trivy, run k8sgpt analyze again. it completed successfully.

error message: k8sgpt analyze --explain fatal error: concurrent map writes goroutine 16 [running]: [k8s.io/apimachinery/pkg/runtime.(Scheme).AddKnownTypeWithName(0xc00014b3b0](http://k8s.io/apimachinery/pkg/runtime.(Scheme).AddKnownTypeWithName(0xc00014b3b0), {{0x2be7d26, 0x16}, {0x2bc5c63, 0x8}, {0x241e398, 0x15}}, {0x345f568?, 0xc0009b4310}) /home/runner/go/pkg/mod/k8s.io/apimachinery@v0.28.4/pkg/runtime/scheme.go:181 +0x345 [k8s.io/apimachinery/pkg/runtime.(Scheme).AddKnownTypes(0xc00014b3b0](http://k8s.io/apimachinery/pkg/runtime.(Scheme).AddKnownTypes(0xc00014b3b0), {{0x2be7d26?, 0x0?}, {0x2bc5c63?, 0x0?}}, {0xc000854620?, 0x16, 0x0?}) /home/runner/go/pkg/mod/k8s.io/apimachinery@v0.28.4/pkg/runtime/scheme.go:148 +0x176 github.com/aquasecurity/trivy-operator/pkg/apis/aquasecurity/v1alpha1.addKnownTypes(0xc0008547b8?) /home/runner/go/pkg/mod/github.com/aquasecurity/trivy-operator@v0.17.1/pkg/apis/aquasecurity/v1alpha1/register.go:22 +0x4b7 k8s.io/apimachinery/pkg/runtime.(*SchemeBuilder).AddToScheme(...) /home/runner/go/pkg/mod/k8s.io/apimachinery@v0.28.4/pkg/runtime/scheme_builder.go:29 [github.com/k8sgpt-ai/k8sgpt/pkg/integration/trivy.TrivyAnalyzer.analyzeConfigAuditReports({0x0](http://github.com/k8sgpt-ai/k8sgpt/pkg/integration/trivy.TrivyAnalyzer.analyzeConfigAuditReports(%7B0x0)?, 0x0?}, {0xc000c0a1e0, {0x34736e0, 0x4d66ac0}, {0x0, 0x0}, {0x3473750, 0x4d21c00}, 0x0, ...}) /home/runner/work/k8sgpt/k8sgpt/pkg/integration/trivy/analyzer.go:92 +0x6e [github.com/k8sgpt-ai/k8sgpt/pkg/integration/trivy.TrivyAnalyzer.Analyze({0x0](http://github.com/k8sgpt-ai/k8sgpt/pkg/integration/trivy.TrivyAnalyzer.Analyze(%7B0x0)?, 0x0?}, {0xc000c0a1e0, {0x34736e0, 0x4d66ac0}, {0x0, 0x0}, {0x3473750, 0x4d21c00}, 0x0, ...}) /home/runner/work/k8sgpt/k8sgpt/pkg/integration/trivy/analyzer.go:162 +0x58 [github.com/k8sgpt-ai/k8sgpt/pkg/analysis.(Analysis).RunAnalysis.func3({0x3446300](http://github.com/k8sgpt-ai/k8sgpt/pkg/analysis.(Analysis).RunAnalysis.func3(%7B0x3446300)?, 0xc0005c463c?}, {0xc0005552c0, 0x11}) /home/runner/work/k8sgpt/k8sgpt/pkg/analysis/analysis.go:268 +0xd9 created by github.com/k8sgpt-ai/k8sgpt/pkg/analysis.(*Analysis).RunAnalysis in goroutine 1 /home/runner/work/k8sgpt/k8sgpt/pkg/analysis/analysis.go:266 +0x685

Expected behaviour

run k8sgpt analyze should complete successfully

Actual behaviour

it failed

Additional Information

No response

VaibhavMalik4187 commented 2 months ago

Concurrent map writes indicate that this is a synchronization problem. I'll take a look. Thanks for reporting @liyuerich

VaibhavMalik4187 commented 2 months ago

Small update, I tried to reproduce the issue with the the steps mentioned above. Unfortunately, I couldn't replicate this issue on Ubuntu 23.10, K8SGPT version: master

xiormeesh commented 2 months ago

I'm also getting intermittent "concurrent map writes" on 0.3.29 but I don't have trivy integration enabled, this seems to happen when the system is under load but even then I can't reproduce it reliably, just rerunning the command usually produces expected output.

I had it twice, both times I was running cluster-wide analysis (not limiting by the namespaces, having all filters enabled including Log with slows down the analysis significantly), kubeapi was also quite busy with other queries (first time installing several operators in parallel, second time running another scanning tool querying kubeapi as well), both times rerunning exactly the same command right after the failure succeeds.

k8sgpt version: 0.3.29 k8s version: v1.28.7 installed via brew running on: Ubuntu 22.04.4 LTS, kernel 5.14.0-1054-oem,

CLI commands and output (first time it failed, didn't save the log from the second one): k8sgpt_analyze_concurrent_map_writes.log

xiormeesh commented 2 months ago

It happened again today working with another cluster (same k8sgpt cli installation), working fine I removed log filter and run analyze again

k8sgpt filters remove Log

Filter(s) Log removed

k8sgpt analyze

fatal error: concurrent map writes fatal error: concurrent map writes

goroutine 32 [running]: k8s.io/apimachinery/pkg/runtime.(Scheme).AddKnownTypeWithName(0xc000268d20, {{0x40979af, 0x19}, {0x40590bf, 0x2}, {0x3770c97, 0x7}}, {0x4a541d8, 0xc00094e680}) k8s.io/apimachinery@v0.28.4/pkg/runtime/scheme.go:174 +0x270 k8s.io/apimachinery/pkg/runtime.(Scheme).AddKnownTypes(0xc000268d20, {{0x40979af?, 0x0?}, {0x40590bf?, 0x0?}}, {0xc000bfa7f8?, 0x6?, 0xc0002bd4d0?}) k8s.io/apimachinery@v0.28.4/pkg/runtime/scheme.go:148 +0x165 sigs.k8s.io/gateway-api/apis/v1.addKnownTypes(0xc000268d20) sigs.k8s.io/gateway-api@v1.0.0/apis/v1/zz_generated.register.go:60 +0x186 k8s.io/apimachinery/pkg/runtime.(SchemeBuilder).AddToScheme(...) k8s.io/apimachinery@v0.28.4/pkg/runtime/scheme_builder.go:29 github.com/k8sgpt-ai/k8sgpt/pkg/analyzer.GatewayClassAnalyzer.Analyze({}, {0xc000938ea0, {0x4a6c5b8, 0x65a8dc0}, {0x0, 0x0}, {0x0, 0x0}, 0x0, {0x0, ...}, ...}) github.com/k8sgpt-ai/k8sgpt/pkg/analyzer/gatewayclass.go:38 +0x11f github.com/k8sgpt-ai/k8sgpt/pkg/analysis.(Analysis).RunAnalysis.func3({0x4a3a9c0?, 0x65a8dc0?}, {0xc000904380, 0xc}) github.com/k8sgpt-ai/k8sgpt/pkg/analysis/analysis.go:268 +0xd9 created by github.com/k8sgpt-ai/k8sgpt/pkg/analysis.(*Analysis).RunAnalysis in goroutine 1 github.com/k8sgpt-ai/k8sgpt/pkg/analysis/analysis.go:266 +0x65e

Now k8sgpt analyze is failing even if I enable back Log filter, so 100% reproducible but I still have no idea how to trigger that on purpose, because I've enabled/disabled Log filter before without issue. I'm going to wait until tomorrow and see if reinstalling k8sgpt will fix it (I'll need it for a demo tomorrow).

chaunceyt commented 2 months ago

Hi, I'm adding support for external-secrets via integrations and see this issue when running go run . analyze. However, if I run go run . analyze --filter SecretStore I get the expected output.

go run . integrations list
Active:
> externalsecrets
Unused:
> trivy
> prometheus
> aws
 go run . filters list
Active:
> ClusterExternalSecret (integration)
> ClusterSecretStore (integration)
> Deployment
> Ingress
> SecretStore (integration)
> MutatingWebhookConfiguration
> ExternalSecrets
> Node
> Pod
> StatefulSet
> ValidatingWebhookConfiguration
> PersistentVolumeClaim
> ExternalSecret (integration)
> ReplicaSet
> PushSecret (integration)
> HorizontalPodAutoScaler
> Service
> CronJob
Unused:
> GatewayClass
> Gateway
> HTTPRoute
> PodDisruptionBudget
> NetworkPolicy
> Log

I attributed it to the number of analyzers I introduced. Each of those required an AddToScheme.

    err := v1alpha1.AddToScheme(client.Scheme())
    if err != nil {
        return nil, err
    }

Things seem to get better when I switched to using the following:

    var mutex = &sync.RWMutex{}

    mutex.Lock()
    err := v1alpha1.AddToScheme(client.Scheme())
    if err != nil {
        return nil, err
    }
    mutex.Unlock()

Seeing the reference to Trivy it made we wonder if the issue related to the way integrations loads an integration and executes it.

OS Dawrin 13.6.6 Branch: main Kind cluster: v1.29.2

connecurrent-map-writes.txt