karlseguin / ccache

A golang LRU Cache for high concurrency
MIT License
1.27k stars 118 forks source link

Bug Report: v3.0.0 gc may failed #76

Closed stormluke closed 1 year ago

stormluke commented 1 year ago

Thanks for this great software! In my usage scenario, i observed that cache size keep going up and dropped equals 0 for a long time, this eventually result in oom. I think there maybe some bug here.

Here is a test code:

func TestPrune(t *testing.T) {
    maxSize := int64(5000)
    cache := ccache.New(ccache.Configure[string]().MaxSize(maxSize))
    epoch := 0
    for {
        epoch += 1
        expired := make([]string, 0)
        for i := 0; i < 50; i += 1 {
            key := strconv.FormatInt(rand.Int63n(maxSize*20), 10)
            item := cache.Get(key)
            if item == nil || item.TTL() > 1*time.Minute {
                expired = append(expired, key)
            }
        }
        for _, key := range expired {
            cache.Set(key, key, 5*time.Minute)
        }
        if epoch%5000 == 0 {
            size := cache.GetSize()
            dropped := cache.GetDropped()
            fmt.Printf("size=%d dropped=%d\n", size, dropped)
            time.Sleep(100 * time.Millisecond)
        }
    }
}

When running this code, the size will greater than 5000, and dropped keep equals 0.

=== RUN   TestPrune
size=30270 dropped=171439
size=48587 dropped=149851
size=7654 dropped=225531
size=42967 dropped=146521
size=28343 dropped=162295
size=93191 dropped=195
size=98497 dropped=0
size=93155 dropped=15731
size=98476 dropped=0
size=98913 dropped=0
size=98936 dropped=0
size=98965 dropped=0

This may be the reason:

func (c *Cache[T]) gc() int {
   dropped := 0
   node := c.list.Tail

   itemsToPrune := int64(c.itemsToPrune)
   if min := c.size - c.maxSize; min > itemsToPrune {
      itemsToPrune = min
   }

   for i := int64(0); i < itemsToPrune; i++ {
      if node == nil { // gc is skipped if c.list.Tail = nil
         return dropped
      }
      prev := node.Prev
      item := node.Value
      if c.tracking == false || atomic.LoadInt32(&item.refCount) == 0 {
         c.bucket(item.key).delete(item.key)
         c.size -= item.size
         c.list.Remove(node)
         if c.onDelete != nil {
            c.onDelete(item)
         }
         dropped += 1
         item.promotions = -2
      }
      node = prev
   }
   return dropped
}

func (l *List[T]) Remove(node *Node[T]) {
   next := node.Next
   prev := node.Prev

   if next == nil {
      l.Tail = node.Prev // second Remove will make Tail = nil
   } else {
      next.Prev = prev
   }

   if prev == nil {
      l.Head = node.Next
   } else {
      prev.Next = next
   }
   node.Next = nil
   node.Prev = nil
}
karlseguin commented 1 year ago

thanks for the detailed report, it really helped narrow it down. I added a slightly modified version of your test :)

stormluke commented 1 year ago

I think there is still a problem here. If a node is in both the deletables and the promotables, and deletables is consumed before promotables, the node will be Remove() twice, causing gc() to fail.

If you keep this test case running, it will fail

func Test_CachePrune(t *testing.T) {
    maxSize := int64(500)
    cache := New(Configure[string]().MaxSize(maxSize).ItemsToPrune(50))
    epoch := 0
    for {
        epoch += 1
        expired := make([]string, 0)
        for i := 0; i < 50; i += 1 {
            key := strconv.FormatInt(rand.Int63n(maxSize*20), 10)
            item := cache.Get(key)
            if item == nil || item.TTL() > 1*time.Minute {
                expired = append(expired, key)
            }
        }
        for _, key := range expired {
            cache.Set(key, key, 5*time.Minute)
        }
        if epoch%500 == 0 {
            fmt.Printf("size=%d, dropped=%d\n", cache.GetSize(), cache.GetDropped())
        }
        if cache.GetSize() > 5000 {
            fmt.Printf("size=%d, dropped=%d\n", cache.GetSize(), cache.GetDropped())
            t.Fail()
            return
        }
    }
}

I think the fix is let list.node = nil (and maybe item.promotions = -2) after Remove() in doDelete(), same as in gc().

func (c *Cache[T]) doDelete(item *Item[T]) {
    if item.node == nil {
        item.promotions = -2
    } else {
        c.size -= item.size
        if c.onDelete != nil {
            c.onDelete(item)
        }
        c.list.Remove(item.node)
        item.node = nil // fix
                item.promotions = -2 // fix
    }
}
karlseguin commented 1 year ago

FYI, not sure I'll be able to look into this until early next week, sorry.

stormluke commented 1 year ago

I think this commit solves the issue, could you release a new tag? thanks.

karlseguin commented 1 year ago

sure, done