rcrowley / go-metrics

Go port of Coda Hale's Metrics library
Other
3.43k stars 493 forks source link

Fix memory leak caused by registry ignoring metrics of external types #264

Open michaelschlies opened 4 years ago

michaelschlies commented 4 years ago

fixes #244

michaelschlies commented 4 years ago

reverted out the move to sync.Map to keep backward compatibility before go 1.9. still not increasing in memory, this is ready for review.

mihasya commented 4 years ago

Thanks for tracking this down and submitting a PR. I wonder why this check was there in the first place. Nothing obvious in blame, let me see if I can track down @rcrowley and understand what the reasoning was for doing the check in the first place.

michaelschlies commented 4 years ago

I eventually did see some slow leaks come from other places in the code but haven't opened a PR for those yet (One of my running variants of this is perfectly happy with this change, no increase in memory, the other is increasing coming out of meter.go (they consume this slightly differently so I am hoping to get some more pprof dumps from the one that is increasing (albeit very slowly) from go-metrics) when I build a passthrough route in my main application (Java Ratpack) into the sidecar container using this library!

michaelschlies commented 4 years ago

(I know our big leak from registry.go was the fact we have a wrapper metric that uses other metrics types so it was being discarded from registry and just kept allocating new objects for the same metrics)

mihasya commented 4 years ago

This probably calls for a broader discussion of what makes a "metric" and if we should be using an interface here rather than manual type checking 🤔 I think the main thing preventing the use of an interface is that we want to make it possible for metrics to generate a float64 OR an int64 and there's no type that captures both of those in Golang (at least I remember lamenting this, oh, 5 years ago when we last talked about this; could well be some new feature makes this possible).

@rcrowley and I exchanged some texts about this, and he confirmed my suspicion that the check is there to guard from crashes down the line, since both the argument to register and the value type for the r.metrics map is interface{}. He agrees that due to 1. all of this being buried in private space and 2. the fact that any consumer of the registry has to do type switching on the metrics later anyway because nothing can be assumed about interface{}, it's probably reasonably safe to remove the check.

On that note, in your use of wrapper metric types, how were you accessing them later? Do you have your own function that uses .Each and typechecks the metrics that come out? I would assume this function, too, would never see the metrics you registered, since register drops unrecognized types on the ground without returning an error.

michaelschlies commented 4 years ago

In our case (and I'm trying to infer why it was chosen to be built a particular way because I'm not on the team that built it) We have an HTTPMetric that contains a Meter, Timer, and Histogram that all trickle into an internal metric pipeline with zero configuration for a consuming application team. This way anything using net/http or gin get metrics basically for free (https://youtu.be/cnHfK4MZA2Y?t=933 if you're curious about the whole services for free thing to our internal platform).

mihasya commented 4 years ago

@michaelschlies right, but if register doesn't actually add the HTTPMetric to its registry because it fails the type check, how are you currently getting the data back out? Or do you basically have some parallel registry?

mihasya commented 4 years ago

@michaelschlies also, isn't a metrics.Timer already a Meter, Counter, and Histogram all in one? It's not explicitly called out as such, so we should probably think about some documentation polish; however check the actual interface. Is what you have different? https://github.com/rcrowley/go-metrics/blob/9180854cf5a015b52b88b8c4d3aebeef411fbe78/timer.go#L8-L28

FWIW, go-tigertonic, which is a simple HTTP lib @rcrowley wrote concurrently with this library, just uses metric.Timer: https://github.com/rcrowley/go-tigertonic/blob/87d5e7ed6b42eaa9daadd697d2e169bf0f8d8c7d/metrics.go#L210-L225

michaelschlies commented 4 years ago

I can pass that along to the team that actually built this library internally, I was just trying to figure out why my application that started out at 2mb RAM utilized grew like crazy shrug

mihasya commented 4 years ago

Much appreciated. We should def fix this leak, but some feedback from them would be helpful in figuring out a longer term plan for the registry.

On Sat, Aug 24, 2019 at 12:49 Michael Schlies notifications@github.com wrote:

I can pass that along to the team that actually built this library internally, I was just trying to figure out why my application that started out at 2mb RAM utilized grew like crazy shrug

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/rcrowley/go-metrics/pull/264?email_source=notifications&email_token=AAAFQREICDFBJ7DV6HBCHODQGGGGBA5CNFSM4IMTQN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5CGLFQ#issuecomment-524576150, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAFQRBNWAJQVWK2TGHI6JTQGGGGBANCNFSM4IMTQN2A .

-- Sent from Russia, with love. But also from my phone.

michaelschlies commented 4 years ago

Apparently the person that built the specific Http metric struct isn't actually here anymore, so that's a bit of a bummer, but the only kind of funkiness appears to be in the histogram:

func NewHTTPMetric() *HTTPMetric {
    return &HTTPMetric{
        Meter:                 metrics.NewMeter(),
        ResponseTimer:         metrics.NewTimer(),
        ResponseSizeHistogram: metrics.NewHistogram(metrics.NewUniformSample(100)),
    }
}

So I can't get the background on the why but we do have a fair amount of automation in several downstream applications wrapped in using our library (wrapping go-metrics in this case).