uber-go / tally

A Go metrics interface with fast buffered metrics and third party reporters
MIT License
846 stars 115 forks source link

32bit systems panic on unaligned 64-bit atomic operation #223

Open gibmat opened 1 year ago

gibmat commented 1 year ago

The reporter struct defined in m3/reporter.go includes fields with atomic types that are not properly aligned on 32bit systems. This causes TestIntegrationProcessFlushOnExit to eventually timeout, since the test "recovers" from the panic:

panic: unaligned 64-bit atomic operation
    panic: unaligned 64-bit atomic operation [recovered]
    panic: unaligned 64-bit atomic operation

Rearranging the order so those fields come first to guarantee proper alignment fixes the issue:

diff --git a/m3/reporter.go b/m3/reporter.go
index e25e436..141f634 100644
--- a/m3/reporter.go
+++ b/m3/reporter.go
@@ -105,6 +105,12 @@ type Reporter interface {
 // remote M3 collector, metrics are batched together and emitted
 // via either thrift compact or binary protocol in batch UDP packets.
 type reporter struct {
+   now             atomic.Int64
+   pending         atomic.Uint64
+   numBatches      atomic.Int64
+   numMetrics      atomic.Int64
+   numWriteErrors  atomic.Int64
+   done            atomic.Bool
    bucketIDTagName string
    bucketTagName   string
    bucketValFmt    string
@@ -114,24 +120,18 @@ type reporter struct {
    calcProto       thrift.TProtocol
    client          *m3thrift.M3Client
    commonTags      []m3thrift.MetricTag
-   done            atomic.Bool
    donech          chan struct{}
    freeBytes       int32
    metCh           chan sizedMetric
-   now             atomic.Int64
    overheadBytes   int32
-   pending         atomic.Uint64
    resourcePool    *resourcePool
    stringInterner  *cache.StringInterner
    tagCache        *cache.TagCache
    wg              sync.WaitGroup

    batchSizeHistogram    tally.CachedHistogram
-   numBatches            atomic.Int64
    numBatchesCounter     tally.CachedCount
-   numMetrics            atomic.Int64
    numMetricsCounter     tally.CachedCount
-   numWriteErrors        atomic.Int64
    numWriteErrorsCounter tally.CachedCount
    numTagCacheCounter    tally.CachedCount
 }
prateek commented 1 year ago

@gibmat yeah... https://github.com/golang/go/issues/36606 is the upstream bug in the go compiler for this case.

fwiw - this library is primarily in use at Uber, where all our servers are 64bit. i don't know if we're ever going to have users on non-64bit platforms, this is why we don't have any CI/validation. i'm not sure what other issues you're going to run into by going down this road.

if you don't mind, may I ask what project/platform you're using this library for?

gibmat commented 1 year ago

I'm helping with the packaging of this library for Debian (https://tracker.debian.org/pkg/golang-github-uber-go-tally), as it's a dependency of other libraries that I want to package. Debian supports a handful of 32bit architectures, and I saw this issue when the tests failed on those systems.