johannesvollmer / exrs

100% Safe Rust OpenEXR file library
Other
149 stars 22 forks source link

Soft allocation limits increase peak memory usage, are all-around harmful #187

Closed Shnatsel closed 1 year ago

Shnatsel commented 1 year ago

In the io::read_into_vec function it's possible to specify a "soft limit" on allocations, and the slice will only be resized in the specified increments:

https://github.com/johannesvollmer/exrs/blob/0fd36cebc0f0cbe1f77ac23954cace26bf9f637a/src/io.rs#L309-L317

This does not do what it advertises.

When a Vec is reallocated, it needs to first allocate the bigger storage, then copy over the data from the old allocation to the new one, and finally drop the old allocation. So the peak memory usage is old_capacity + new_capacity.

If the soft limit kicks in, and the loop gets executed more than once, then what happens is first the Vec is reallocated with capacity to hold data.len() + soft_limit (thus copying all the data), and then gets reallocated again to hold old_capacity + soft_limit + new_capacity + soft_limit * 2.

The memory usage also gets worse with every iteration of the loop, and it incurs an expensive memory copy every time.

Instead of looping with a limit, the Vec should be set to the appropriate capacity and length up front, saving memory and the extra copies.

Shnatsel commented 1 year ago

Okay, I see that they prevent files with very large declared block sizes from OOMing the system.

The memory wouldn't actually be provisioned unless it's written to, but very large allocation values can cause issues all on their own, without the data being actually written.

As long as the allocation limit is much greater than the typical block size, this is fine.

johannesvollmer commented 1 year ago

yes :) when calling the function, the soft max value should be very large, so large that it only triggers for an invalid file, or absurdly large files. The resizing algorithm can perhaps be improved nonetheless though :)

johannesvollmer commented 1 year ago

did you find a file that triggers this loop? because in that case, we should definitely increase the soft limit in the appropriate calls

Shnatsel commented 1 year ago

No, I have not seen such files, but then again I haven't really looked.

johannesvollmer commented 1 year ago

let's insert a panic where the soft limit would kick in, and then run the tests, just to be sure.

the images in the repository are small, which means it probably won't kick in. if it does kick in for those small images, it's a really badly chosen limit