weaveworks / scope

Monitoring, visualisation & management for Docker & Kubernetes
https://www.weave.works/oss/scope/
Apache License 2.0
5.84k stars 708 forks source link

Crashloop - fatal error: concurrent map iteration and map write #3855

Closed yiannistri closed 3 years ago

yiannistri commented 3 years ago

What you expected to happen?

Scope UI to always be available.

What happened?

Scope UI did not show any data which prompted me to look at its logs. Scope-app pod was crashlooping and recovered after deleting that pod.

How to reproduce it?

No action/change happened but Scope UI did not return any data.

Anything else we need to know?

Versions:


Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T21:15:16Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.17", GitCommit:"f3abc15296f3a3f54e4ee42e830c61047b13895f", GitTreeState:"clean", BuildDate:"2021-01-13T13:13:00Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Logs:

https://gist.github.com/yiannistri/8376db2633ddb1588aa4a6fe70ebc3d7

bboreham commented 3 years ago

Suspicion falls on this "unsafe" code which two goroutines are running at the same time: (Suspect it got broken by #3850)

goroutine 2844 [running]:
runtime.throw(0x20faf56, 0x26)
    /usr/local/go/src/runtime/panic.go:1117 +0x72 fp=0xc01426ea00 sp=0xc01426e9d0 pc=0x43a4d2
runtime.mapiternext(0xc01426eac0)
    /usr/local/go/src/runtime/map.go:858 +0x54c fp=0xc01426ea80 sp=0xc01426ea00 pc=0x412eec
github.com/weaveworks/scope/report.(*Report).UnsafeRemovePartMergedNodes.func1(0x20b3179, 0x8, 0xc01328e018)
    /go/src/github.com/weaveworks/scope/report/report.go:376 +0xa5 fp=0xc01426ebb8 sp=0xc01426ea80 pc=0xcefb25
github.com/weaveworks/scope/report.(*Report).WalkNamedTopologies(0xc01328e000, 0xc01426ec88)
    /go/src/github.com/weaveworks/scope/report/report.go:404 +0xac fp=0xc01426ec10 sp=0xc01426ebb8 pc=0xce5ecc
github.com/weaveworks/scope/report.(*Report).UnsafeRemovePartMergedNodes(0xc01328e000, 0x26d58d0, 0xc012b403c0)
    /go/src/github.com/weaveworks/scope/report/report.go:375 +0xd1 fp=0xc01426ee28 sp=0xc01426ec10 pc=0xce5ab1
github.com/weaveworks/scope/app.(*websocketState).update(0xc0142735f8, 0x26d58d0, 0xc00cb06d50, 0x0, 0x0)
    /go/src/github.com/weaveworks/scope/app/api_topology.go:185 +0x57f fp=0xc0142734e8 sp=0xc01426ee28 pc=0x182d5df
goroutine 164 [runnable]:
github.com/weaveworks/scope/report.(*Report).UnsafeRemovePartMergedNodes.func1(0x20b3179, 0x8, 0xc011607518)
    /go/src/github.com/weaveworks/scope/report/report.go:376 +0xa5
github.com/weaveworks/scope/report.(*Report).WalkNamedTopologies(0xc011607500, 0xc004a04c88)
    /go/src/github.com/weaveworks/scope/report/report.go:404 +0xac
github.com/weaveworks/scope/report.(*Report).UnsafeRemovePartMergedNodes(0xc011607500, 0x26d58d0, 0xc0129d5b60)
    /go/src/github.com/weaveworks/scope/report/report.go:375 +0xd1
github.com/weaveworks/scope/app.(*websocketState).update(0xc004a095f8, 0x26d58d0, 0xc002048150, 0x0, 0x0)
    /go/src/github.com/weaveworks/scope/app/api_topology.go:185 +0x57f