Closed Shnatsel closed 1 year ago
Okay, I see that they prevent files with very large declared block sizes from OOMing the system.
The memory wouldn't actually be provisioned unless it's written to, but very large allocation values can cause issues all on their own, without the data being actually written.
As long as the allocation limit is much greater than the typical block size, this is fine.
yes :) when calling the function, the soft max value should be very large, so large that it only triggers for an invalid file, or absurdly large files. The resizing algorithm can perhaps be improved nonetheless though :)
did you find a file that triggers this loop? because in that case, we should definitely increase the soft limit in the appropriate calls
No, I have not seen such files, but then again I haven't really looked.
let's insert a panic where the soft limit would kick in, and then run the tests, just to be sure.
the images in the repository are small, which means it probably won't kick in. if it does kick in for those small images, it's a really badly chosen limit
In the
io::read_into_vec
function it's possible to specify a "soft limit" on allocations, and the slice will only be resized in the specified increments:https://github.com/johannesvollmer/exrs/blob/0fd36cebc0f0cbe1f77ac23954cace26bf9f637a/src/io.rs#L309-L317
This does not do what it advertises.
When a Vec is reallocated, it needs to first allocate the bigger storage, then copy over the data from the old allocation to the new one, and finally drop the old allocation. So the peak memory usage is
old_capacity + new_capacity
.If the soft limit kicks in, and the loop gets executed more than once, then what happens is first the Vec is reallocated with capacity to hold
data.len() + soft_limit
(thus copying all the data), and then gets reallocated again to holdold_capacity + soft_limit + new_capacity + soft_limit * 2
.The memory usage also gets worse with every iteration of the loop, and it incurs an expensive memory copy every time.
Instead of looping with a limit, the Vec should be set to the appropriate capacity and length up front, saving memory and the extra copies.