peterbourgon / diskv

A disk-backed key-value store.
http://godoc.org/github.com/peterbourgon/diskv
MIT License
1.41k stars 102 forks source link

ReadStream copies values into memory too soon #17

Open jonboulle opened 9 years ago

jonboulle commented 9 years ago

As currently implemented, Diskv.ReadStream is not a purely streaming reader, because it buffers an entire copy of the value (in the siphon) before attempting to put the value into the cache. Unfortunately with large values this is a recipe for memory exhaustion.

Ideally we would stream the value directly into the cache as the io.ReadCloser that ReadStream returns is consumed, checking the cache size as we go. I started to go down this path, but it creates another race condition because then writing into the cache is not atomic: we cannot know when the ReadCloser will be finished consuming the entry, and it's very possible for others to begin Reading the same key-value pair as we're still writing it. So the next step down that road is for readers to actually take a lock per cache entry (which would then get released once the caller Closes the ReadCloser). This quickly became a web of mutexes and synchronisation hacks which felt very unidiomatic golang.

Various simple "solutions" exist (e.g. prechecking the size of the file against the cache size before starting to siphon), but they are all inherently racy and could still lead to memory exhaustion under stressful conditions. (We could also just take a global write lock during reads, but that wouldn't be very nice to other readers, would it?)

@peterbourgon @philips thoughts?

peterbourgon commented 9 years ago

Definitely can't take a global write lock. Maybe there's a way to change the cache structure (significantly) to allow per-key synchronization. I bet a smarter play would be to siphon into a buffer off-cache, making regular checkpoints to ensure we don't exceed the max, and then atomically moving the siphoned data into the cache on completion. Something along these lines...

I think it's feasible. Give me a moment to think on it.

jonboulle commented 9 years ago

Yeah, that's kind of the direction I was going. Main thing I don't like about that is then potentially you have multiple readers trying to generate redundant cache entries alongside each other. But maybe I'm overoptimising.

In the meantime, I verified that the existing behaviour is already kinda racy :-/ https://gist.github.com/jonboulle/ab3f77bfb8c85fb4c022

peterbourgon commented 9 years ago

Previous write by goroutine 26:

Haha, whoops! :person_frowning:

peterbourgon commented 9 years ago

@jonboulle just to be clear, this is already fixed for your specific use-case with #21, correct?

jonboulle commented 9 years ago

@peterbourgon well technically I would say that for our specific use case we are just sidestepping the problem :-). I would still love to see this fixed and am kind of mulling on it in the back of my brain - if you wouldn't mind leaving it open a bit longer maybe I can come up with a patch...

peterbourgon commented 9 years ago

Understood. Yeah, I'm mulling it too, just not finding quite as much mull-time as I'd like :)

mkilpatrick commented 4 years ago

Is there any plan to address this issue? We've found that setting a CacheSizeMax results in using a lot more heap memory and then ultimately still doesn't limit disk cache through restarts.