ionos-cloud / cluster-api-provider-ionoscloud

cluster-api provider for IONOS Cloud
Apache License 2.0
12 stars 4 forks source link

:bug: Fix race condition bugs in locker package #191

Closed avorima closed 1 month ago

avorima commented 2 months ago

What is the purpose of this pull request/Why do we need it?

Prevents two locking-related race conditions:

Issue #, if available:

Description of changes:

Special notes for your reviewer:

I noticed a failed run in a different PR: https://github.com/ionos-cloud/cluster-api-provider-ionoscloud/actions/runs/10078133906/job/27862335344?pr=190

Output ``` WARNING: DATA RACE Write at 0x00c000372060 by goroutine 118: runtime.mapdelete_faststr() /opt/hostedtoolcache/go/1.22.5/x64/src/runtime/map_faststr.go:301 +0x0 github.com/ionos-cloud/cluster-api-provider-ionoscloud/scope.(*Cluster).DeleteCurrentRequestByDatacenter() /home/runner/work/cluster-api-provider-ionoscloud/cluster-api-provider-ionoscloud/scope/cluster.go:199 +0x291 github.com/ionos-cloud/cluster-api-provider-ionoscloud/scope.TestCurrentRequestByDatacenterAccessors.func1() /home/runner/work/cluster-api-provider-ionoscloud/cluster-api-provider-ionoscloud/scope/cluster_test.go:384 +0x352 github.com/ionos-cloud/cluster-api-provider-ionoscloud/scope.TestCurrentRequestByDatacenterAccessors.gowrap1() /home/runner/work/cluster-api-provider-ionoscloud/cluster-api-provider-ionoscloud/scope/cluster_test.go:389 +0x61 Previous read at 0x00c000372060 by goroutine 122: runtime.mapaccess2_faststr() /opt/hostedtoolcache/go/1.22.5/x64/src/runtime/map_faststr.go:108 +0x0 github.com/ionos-cloud/cluster-api-provider-ionoscloud/scope.(*Cluster).GetCurrentRequestByDatacenter() /home/runner/work/cluster-api-provider-ionoscloud/cluster-api-provider-ionoscloud/scope/cluster.go:174 +0x2ef github.com/ionos-cloud/cluster-api-provider-ionoscloud/scope.TestCurrentRequestByDatacenterAccessors.func1() /home/runner/work/cluster-api-provider-ionoscloud/cluster-api-provider-ionoscloud/scope/cluster_test.go:372 +0xd7 github.com/ionos-cloud/cluster-api-provider-ionoscloud/scope.TestCurrentRequestByDatacenterAccessors.gowrap1() /home/runner/work/cluster-api-provider-ionoscloud/cluster-api-provider-ionoscloud/scope/cluster_test.go:389 +0x61 Goroutine 118 (running) created at: github.com/ionos-cloud/cluster-api-provider-ionoscloud/scope.TestCurrentRequestByDatacenterAccessors() /home/runner/work/cluster-api-provider-ionoscloud/cluster-api-provider-ionoscloud/scope/cluster_test.go:369 +0x36a testing.tRunner() /opt/hostedtoolcache/go/1.22.5/x64/src/testing/testing.go:1689 +0x21e testing.(*T).Run.gowrap1() /opt/hostedtoolcache/go/1.22.5/x64/src/testing/testing.go:1742 +0x44 Goroutine 122 (running) created at: github.com/ionos-cloud/cluster-api-provider-ionoscloud/scope.TestCurrentRequestByDatacenterAccessors() /home/runner/work/cluster-api-provider-ionoscloud/cluster-api-provider-ionoscloud/scope/cluster_test.go:369 +0x36a testing.tRunner() /opt/hostedtoolcache/go/1.22.5/x64/src/testing/testing.go:1689 +0x21e testing.(*T).Run.gowrap1() /opt/hostedtoolcache/go/1.22.5/x64/src/testing/testing.go:1742 +0x44 ================== panic: no such lock: uid/currentRequestByDatacenter goroutine 146 [running]: github.com/ionos-cloud/cluster-api-provider-ionoscloud/internal/util/locker.(*Locker).Unlock(0xc00046f7a0, {0xc0003ae5a0, 0x1e}) /home/runner/work/cluster-api-provider-ionoscloud/cluster-api-provider-ionoscloud/internal/util/locker/locker.go:130 +0x1cc github.com/ionos-cloud/cluster-api-provider-ionoscloud/scope.(*Cluster).GetCurrentRequestByDatacenter(0xc00049b000, {0x244ccc2, 0x2}) /home/runner/work/cluster-api-provider-ionoscloud/cluster-api-provider-ionoscloud/scope/cluster.go:175 +0x36f github.com/ionos-cloud/cluster-api-provider-ionoscloud/scope.TestCurrentRequestByDatacenterAccessors.func1(0xc0003584e0, {0x244ccc2, 0x2}) /home/runner/work/cluster-api-provider-ionoscloud/cluster-api-provider-ionoscloud/scope/cluster_test.go:372 +0xd8 created by github.com/ionos-cloud/cluster-api-provider-ionoscloud/scope.TestCurrentRequestByDatacenterAccessors in goroutine 76 /home/runner/work/cluster-api-provider-ionoscloud/cluster-api-provider-ionoscloud/scope/cluster_test.go:369 +0x36b ```

I also did some fly-by changes that were highlighted by my IDE.

Checklist:

piepmatz commented 2 months ago

Some thoughts:

avorima commented 2 months ago

you're right, it's an issue with the locker. i get no such lock panics in both the scope and locker tests. one time the TestLockerContextDeadlineExceeded tests also failed when asserting that the locks map is empty (it wasn't), but that test actually looks wrong in general. doesn't seem to happen reliably below -count=100 on my machine.

sonarcloud[bot] commented 2 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

avorima commented 2 months ago

@piepmatz I adjusted the locker to fix the Lock behavior on failed context and the timing of releasing the lock.

I ran these tests for some time on my machine and never got any test failures (go test -race -count=1000). As a control I moved channel read in the unlock back to its original position and immediately got a no such lock panic.