Open floren opened 3 years ago
Do you use diskv in situations where there are values with such disparate sizes, and you hope to cache the smaller ones but not cache the larger ones?
I do use diskv in situations where some values are multiple gigabytes, and some are multiple megabytes.
We noticed this behavior when trying to figure out why sending items from one node's diskv to another took up so much memory. Once we figured out what was going on, we disabled the cache, but thought we'd report the behavior and offer a fix. I believe if you configure a 100MB cache, and you read a 500MB item via ReadStream, you'd be surprised to learn that a diskv makes a complete in-memory copy of the item before immediately and always throwing it away.
If you're not particularly worried about that corner case, feel free to close this issue and the related PR.
If the cache is enabled,
readWithRLock
always reads the file using asiphon
.The siphon code copies every byte it reads into a bytes.Buffer. When the full file has been read, that bytes.Buffer is used to update the cache.
However, if the underlying file is e.g. a gigabyte in size, the siphon will end up with a bytes.Buffer containing that entire gigabyte. Unless you've set your cache size to over a gigabyte, this gets thrown away as soon as the ReadStream is done.
The main reason we use ReadStream is so we can deal with very large items without having to stick the entire thing in memory at once. Having discovered this, we'll probably disable the cache, but there are cases where people may wish to have a cache enabled without blowing up their memory!