mailgun / groupcache

Clone of golang/groupcache with TTL and Item Removal support
Apache License 2.0
495 stars 73 forks source link

context deadline on specific keys #30

Closed taraspos closed 2 years ago

taraspos commented 3 years ago

Hello, I'm trying to integrate the library into my application everything works good, except times when sometimes key gets "broken" and keep failing with the

"err":"context canceled",
"key":"52",
"level":"error",
"msg":"error retrieving key from peer 'http://1580b064e10c42e48e58193dffe3fe88.app.backend.testing.:80/_groupcache/'",

I'm not sure how to exactly reproduce that, however, it seems like usually happening upon the creation of the new key. If it failed, then I cannot fetch it anymore, it will keep failing with context canceled.

My code is following:

    s.cache = groupcache.NewGroup("settings", 3000000, groupcache.GetterFunc(
        func(ctx context.Context, key string, dest groupcache.Sink) error {
            ID, err := strconv.ParseUint(key, 10, 64)
            if err != nil {
                return err
            }

            res, err := s.CreateOrGet(ID)
            if err != nil {
                return err
            }
            b, err := json.Marshal(res)
            if err != nil {
                return err
            }

            return dest.SetBytes(b, time.Now().Add(time.Minute*5))
        }))

And later fetching it with the:

    ctx, cancel := context.WithTimeout(context.Background(), time.Second*2)
    defer cancel()

    b := []byte{}
    if err := s.cache.Get(ctx,
        strconv.FormatUint(ID, 10),
        groupcache.AllocatingByteSliceSink(&b),
    ); err != nil {
        return nil, err
    }

Where 2 seconds of timeout should be more than enough. Any idea for possible reasons?

Not sure if this is related, but we are using the DNS discovery and calling pool.Set(...) every 30 seconds.

thrawn01 commented 3 years ago

context canceled comes from the context.Context struct. It means you are trying to use a context.Context that has already been cancelled. I can't help you beyond that, as the groupcache library doesn't cancel contexts, that is up to you. If it was due to a timeout, then the error would be Context deadline exceed.

taraspos commented 3 years ago

Thanks for the reply.

I will try to check more, what can be wrong there, however I'm quite sure, I don't use any context cancellations beyond what I shared in the snippet above:

    ctx, cancel := context.WithTimeout(context.Background(), time.Second*2)
    defer cancel()
taraspos commented 3 years ago

@thrawn01 Actually I see context deadline exceeded in the error logs as well well:

"category":"groupcache",
"err":"context deadline exceeded",
"key":"333",
"level":"error",
"msg":"error retrieving key from peer 'http://f77450301ee7476fb3991844355cc644.app.backend.testing.:80/_groupcache/'",

I double-checked the code, this context is not being canceled anywhere else:

    ctx, cancel := context.WithTimeout(context.Background(), time.Second*2)
    defer cancel()

    b := []byte{}
    if err := s.cache.Get(ctx,
        strconv.FormatUint(ID, 10),
        groupcache.AllocatingByteSliceSink(&b),
    ); err != nil {
        return nil, err
    }

However, I have a question about the setting Pool which can be related (because this is the only "tricky" place in the setup). Is it safe to configure self as a localhost?

So let's assume we have 3 peers: peer1:81, peer2:82, and peer3:83. Each pool is created with following code:

localhost := "http://localhost:"+config.Port
pool = groupcache.NewHTTPPoolOpts(localhost, &groupcache.HTTPPoolOptions{
        BasePath: BasePath,
    })

and then (simplified version of discovery process):

thrawn01 commented 3 years ago

You should not set the local node as localhost on each node, the hashing algorithm will not choose the host who owns the key correctly. I updated the readme to point out that the local node should have a unique address. But apparently I forgot to reply to this issue. https://github.com/mailgun/groupcache/commit/2fc526bccc65ba98fd461dbdf9694d8efbb005ad

Sorry about that.

thrawn01 commented 2 years ago

Closing due to age, Please re-open if there are further questions.