klauspost / reedsolomon

Reed-Solomon Erasure Coding in Go
MIT License
1.88k stars 248 forks source link

Memory bloat #284

Closed benjiro29 closed 3 months ago

benjiro29 commented 3 months ago

Have a basic little sharding in the cloud, with several edge VMs that have 2GB of memory.

I noticed as more request are handled, the memory kept growing until the VMs crashes and restarted.

This was traced back to:

enc, err := reedsolomon.New(5, 2, reedsolomon.WithInversionCache(false))
s.encoder = enc

verified, _ := s.encoder.Verify(buffers)

err := s.encoder.ReconstructData(buffers)

We ended up testing by having 55MB in 45 files (each with 5 data / 2 parity ) to simulate a specific load. So each request to ReconstructData only gets ~1.2 a 1.4MB of data, in a mix of 5 blocks (that can be a mix of data or parity ).

The Go GC needs to be trigger manually to clear out the memory or else, a second request with the same load, the memory spikes past 2GB and crashes the VMs.

We disabled the InversionCache, no effect beyond a small reduction in memory usage.

It takes about ~ 5+ minutes of waiting (idling) before the Go GC finally starts to release the memory. And yes, we are 100% sure the issue is Reconstruct / ReconstructData...

We even tried to nil the buffer values and the buffer to give the GC the best chance to GC, no dice. For some reason, it seem to hold on to the memory.

We also tried to not use a shared encoder and move it into the download routine, same issue....

As i wrote half this text, the memory finally dropped back down on this idle VM. Aka around 5+ minutes. Will do more testing tomorrow but we are sure its the ReconstructData.