uber-go / goleak

Goroutine leak detector
MIT License
4.48k stars 148 forks source link

Make `maxRetries` and `maxSleep` configurable in goleak #93

Open kerthcet opened 1 year ago

kerthcet commented 1 year ago

This is for flexibility and users can adjust to their own scenarios. Or do we have any other concerns? /assign

prashantv commented 1 year ago

Can you provide some more context on where you'd like to adjust the max retries/sleep behaviour? Is there a reason that the current defaults don't work well.

Rather than exposing the underlying values as-is, would it be better to expose a single value for total maximum timeout (or perhaps use a context)?

kerthcet commented 1 year ago

Can you provide some more context on where you'd like to adjust the max retries/sleep behaviour?

The problem is the detecting duration of time is not long enough. also see https://github.com/kubernetes/kubernetes/pull/115456/files#r1097270472

Yes, agree that implementation leak is not something smells good, I'm ok with exposing a single value like Timeout, the problem is we have too many factors here, maxRetries + maxSleep somehow behaves as timeout, a corner case is assuming maxRetries ** maxSleep = 2s, we set timeout to 5s, it still doesn't work.

So can we just remove the maxRetries and make the retry attempts up to non limit, but we will have a default timeout then, which will be exposed for configuration. cc @prashantv

kerthcet commented 1 year ago

Any suggestion @prashantv ?

abhinav commented 1 year ago

One possible option re: @prashantv's point here:

Instead of exposing two parameters "maxRetries" and "maxSleep", perhaps we can expose a single RetryPolicy type/option?

Strawman:

type RetryPolicy interface {
  Attempt(num int) (delay time.Duration, ok bool)
}

Takes the attempt number, reports the sleep duration for that, or false if we shouldn't retry anymore.

The default retry policy will look something like this, that wouldn't be exported -- at least initially:

type defaultRetryPolicy struct {
    maxRetries int
    maxSleep time.Duration
}

func (p *defaultRetryPolicy) Attempt(i int) (time.Duration, bool) {
    if i >= o.maxRetries {
        return 0, false
    }

    d := time.Duration(int(time.Microsecond) << uint(i))
    if d > o.maxSleep {
        d = o.maxSleep
    }
    return d, true
}

@prashantv @sywhang WDYT?

meta: We should probably re-open this issue if we're still discussing this.

kerthcet commented 1 year ago

But this wouldn't solve my problem, isn't it? What I want is increasing the retry attempts or retry timeout. so +1 with timeout option. /reopen

sywhang commented 1 year ago

reopened the issue.

@abhinav I think that's probably a good strawman to begin with. Not sure how I feel about the exact interface/naming (yet) but it's a good start.

@kerthcet would it not? If we exposed a WithRetryPolicy(RetryPolicy) that lets you specify the exact implementation of RetryPolicy to set your retry attempt count or timeout, i think that should be enough to give you what you need. Is there anything I'm missing here?

kerthcet commented 1 year ago

Ah, got the idea, users can implement their own interface to determine whether to retry or not, it should work. One question, does it really make sense to set the retry attempts, how to determine the explicitly value?And yes, I used to expose these two values for I didn't want to change a lot of the package, I should reconsider this again at that moment. :(

prashantv commented 1 year ago

I like the general idea of a retry policy, though I think the most common use is going to be bumping the timeout, so having a default option to bump the timeout seems useful (and internally it determines how many attempts, sleep per-attempt, etc).

Couple of thoughts on the RetryPolicy interface:

anitschke commented 6 months ago

I also came here looking for a way to control the maxRetries and maxSleep, however my use case seems a little different than others mentioned in this issue.

It seem like most other people on this thread would like to extend the timeout to give go routines more time to join before considering it an issue.

In my case I was trying to fix a bug where a go routine was leaking for a long time but eventually joins. Before fixing the bug I tried to write a failing unit test that verifies the go routine has joined by using goleak.VerifyNone(t). However in this unit test I was surprised to find that the test passed even though I know that the go routine leaks! It turns out that my problem was that in my unit test the go routine was only leaking for a few hundred milliseconds. As a result goleak.VerifyNone(t) was initially finding the leak on the first try, but eventually it would retry and the leak would go away. I could artificially inflate the time that the go routine leaks for in the unit test to a few seconds, but the downside is that would also inflate testing time. I would prefer to have some option that disables retries all together.

From my point of view a go routine leaking even a few hundred milliseconds is still a logic error as there should be some code that waits for the go routine to property join.