Possible race conditions?

piotrkot commented 8 months ago

I'm new to Golang but run into problem getting short-lived codes using go-expire-map.

We create the map:

    exMap := expiremap.New()
    defer exMap.Close()

and use it in two endpoints. In the first, we generate random code, put it into the map and pass the code to the user:

    code := c.codes.Generate()
    c.exMap.Set(code, true, time.Minute)

In the second, we check if the code is in the map. If it is, we delete it from the map and do our logic:

    _, found := j.ExMap.Get(code)
    if (!found) {
        log.Printf("Not found code: %s", query)
        w.WriteHeader(http.StatusBadRequest)
        return
    }
    j.ExMap.Delete(code)
   // our logic

Occasionally, I observe that when I check existence of the code returned to me earlier, the code is not found in the map. But the second check for the same code finds it.

Is this a bug or I use the library wrongly?

piotrkot commented 8 months ago

I think the problem is with using RLock() in Get() mthod which is allowed concurrently with Lock() in Set() method.

Alternatively, Set() method could return the stored element. I could pass this to the user, which could guarantee the lock to be removed. What do you think @nursik ?

nursik commented 8 months ago

@piotrkot May you provide a simple code that may show the bug/unexpected behaviour?

I'm new to Golang but run into problem getting short-lived codes using go-expire-map.

We create the map:
    exMap := expiremap.New()
    defer exMap.Close()
and use it in two endpoints. In the first, we generate random code, put it into the map and pass the code to the user:
    code := c.codes.Generate()
    c.exMap.Set(code, true, time.Minute)
In the second, we check if the code is in the map. If it is, we delete it from the map and do our logic:
    _, found := j.ExMap.Get(code)
    if (!found) {
        log.Printf("Not found code: %s", query)
        w.WriteHeader(http.StatusBadRequest)
        return
    }
    j.ExMap.Delete(code)
   // our logic
Occasionally, I observe that when I check existence of the code returned to me earlier, the code is not found in the map. But the second check for the same code finds it.

Is this a bug or I use the library wrongly?

piotrkot commented 8 months ago

@nursik Sorry, I don't have the code to reproduce the bug. Conceptually, it's quite simple. When you call Set method there is no guarantee when this method completes. And if right after that you call Get method, it may return no results yet. I'd like to have a way to ensure the Set method has completed, and the element is properly stored. Do you know how I can do that?

nursik commented 8 months ago

@piotrkot Set, Get and any other methods is a blocking operation. If you mean that you have two goroutines and the first one calls Set and another calls Get, then the order of these calls defines if Get will return a value or not.

Basically, any Get operation after Set returns a value in the same goroutine (Unless of course other goroutine deletes the key in the same time)

The next function will work as expected, unless there is another goroutine modifying the same key.

func Foo(value string) {
    expmap.Get("key") // -> returns nil, false
    expmap.Set("key", value, time.Minute) // Inserts a value
    expmap.Get("key") // -> returns value, true
}

nursik commented 8 months ago

@piotrkot Is it possible that your code generator can generate the same code twice?

piotrkot commented 8 months ago

@nursik As far as I can see in the library code Get method is using RLock() which is not a blocked operation when Set method is called at the same time. I was able to prepare a sample code

import (
    "time"
    "testing"
    "github.com/nursik/go-expire-map"
)

func TestMap(t *testing.T) {
    var done = make(chan bool)
    var msgs = make(chan int)
    em := expiremap.New()
    defer em.Close()

    go produce(em, msgs, done)
    go consume(em, msgs, t)
    <-done
}

func produce(em *expiremap.ExpireMap, m chan int, d chan bool) {
    for i := 0; i < 10; i++ {
        em.Set(i, true, time.Minute)
        m <- i
    }
    close(m)
    d <- true
}

func consume(em *expiremap.ExpireMap, m chan int, t *testing.T) {
    for {
        msg := <- m
        _, found := em.Get(msg)
        if (!found) {
            t.Errorf("Code %d not found", msg)
        }
        em.Delete(msg)
    }
}

After executing this test a few times, I was able to log:

    nursik_test.go:34: Code 0 not found
    nursik_test.go:34: Code 0 not found

As far as I understand, msgs channel is unbuffered. There is a synchronization between goroutines (producer-consumer pattern). Consumer will wait until the message exists in the channel. And yet the Set operation hasn't completed before the Get operation is called.

nursik commented 8 months ago

@piotrkot Thank you for sharing the code.

It seems there are multiple issues, which leads to failed test. First of all, you may notice that test fails only in values 0 or 9. More specifically, at 0 or n-1. The changed code below will produce 0 or 999

func produce(em *expiremap.ExpireMap, m chan int, d chan bool) {
    for i := 5; i < 1000; i++ {
//...

It is possible, because you are reading from closed channel.

A receive operation on a closed channel can always proceed immediately, yielding the element type's zero value after any previously sent values have been received.

In your consume code you are reading messages from channel (msg := <- m) in for loop indefinitely, without checking if channel is closed or not. When it is not closed, it blocks until produce sends a value to the channel and thus it works properly. However, after produce closes channel, the for loop does not block as it reads zero values (in case of int it is 0) without blocking from closed channel! Change t.Errorf to t.Logf and you will see that it never fails on values other than 0 or n-1.

So why it fails at n-1 then? Because produce sends to done channel and main function exits after receiving value from done. It leads to the call of em.Close() which makes all subsequent Get() calls to return nil, false. And it happens so blazingly fast that consume calls Get() method after em.Close()!

Now to the interleaving Get and Set calls. Yes, it is possible that Get is called, Set happened (from another goroutine) and Get is finished. If Set changes key other than in Get (like Get(1), Set(2, 2)), then it will return value from key 1 as expected. When keys are the same, then it will return value 2 as it is the newest value. Such interleaving Get and Set call happens if and only if key was expired

When you invoke Get and receive some value at specified key, you should not assume that the value will be the same after you invoke Get the second time. Other goroutines may write at that key between reads or key could expire. If your code generator can generate keys more than twice, then you have a possible bug (not a race condition) as two parallel endpoint calls may work with the same code.

nursik commented 8 months ago

@piotrkot

_, found := j.ExMap.Get(code)
    if (!found) {
        log.Printf("Not found code: %s", query)
        w.WriteHeader(http.StatusBadRequest)
        return
    }
    j.ExMap.Delete(code)
   // our logic

I also want to point out that two parallel HTTP calls in this code may access the same code (key) regardless of expire map implementation. First HTTP call finds the key in the map and parallelly the second HTTP call finds it too. Then both proceed to do some logic, even though only one of them supposed to proceed. If it is not an idempotent, then you may get unexpected results or bugs.

piotrkot commented 8 months ago

@nursik Thank you for the explanation. It is very reasonable.

Perhaps, neither the simple calling Set and Get after each other from a single goroutine nor calling two goroutines synchronized using a channel shows my issue. And yet, I'm almost sure there is a problem. Let me take the original code of mine, simplify it and write yet another example. It was easily reproducible with a single client so no heavy concurrency. Interestingly, the problem was resolved with another implementation for the expire-map.

nursik commented 8 months ago

@piotrkot Thank you too for investing time to investigate and report a problem👍

The library was written to handle the next issues:

Keys with different TTLs (better alternative for fixed TTL - linked list + map + lock)
Map with thousands keys (better alternative for small map - map + lock)
A lot of keys with the same expiration time. If thousands keys are expired at the same time, the deletion of them will block other goroutines, which may be not appropriate for highload applications

Basically, for most use cases, other libraries have better performance, API and at least do not have failing tests😂

nursik commented 8 months ago

https://github.com/nursik/go-expire-map?tab=readme-ov-file#disclaimer

nursik / go-expire-map

Possible race conditions? #6