tokio-rs / bytes

Utilities for working with bytes
MIT License
1.91k stars 286 forks source link

docs: unclear how to free memory from a bytesmut #634

Open Firstyear opened 1 year ago

Firstyear commented 1 year ago

Hi there,

tokio_util uses bytes mut in codecs. It is possible in some requests that some excess bytes exist in the bytes mut for the next request, which are preserved.

The codec needs to advance and free the bytes that have now been consumed, but from the docs of bytes mut it's hard to know when this might actually occur or the right way to do it.

For example, if we call split_off and we drop the handle, did the capacity associated to that bytesmut get freed? Or does it remain somehow accessible to the other part of the bytes?

I think it would be an improvement to these docs to document when bytes are freed by operations and under what circumstances so that authors can know the correct way to handle a long-lived bytesmut struct.

Thanks!

Darksonn commented 1 year ago

A Bytes is reference counted, so an allocation is freed once all slices into the allocation are dropped. Even a small slice will keep the entire allocation alive.

When a BytesMut is resized to make more space for writing, then it can reclaim parts given out with split_off if the reference count is 1. Otherwise, if there are still other handles left, it will make a new allocation, releasing its reference count on the previous allocation.

Firstyear commented 1 year ago

So what is the correct way to handle this to prevent leaking / over allocating in tokio with a framed codec? The codec docs, and the bytes mut docs show the bytesmut being advanced https://docs.rs/tokio-util/latest/tokio_util/codec/index.html but it's not clear if this the correct way to ensure that memory is freed, or at the least, reused.

This is why I think it would be good for the docs to describe how this works since freeing/reusing the memory is just as important here as the allocations :)

Darksonn commented 1 year ago

I'm always happy to see more documentation added.

Firstyear commented 1 year ago

I can't add docs because I don't know how it works ... else I'd be happy to contribute docs. Can you see why I'm a bit stuck here and raised an issue?

Darksonn commented 1 year ago

That's fine. I agree that adding this documentation would be nice, and I can put it on my list, but it will take me a while to get around to actually doing it.

As for some tips for your own project, the most important guideline is this:

When you use split_off to get a Bytes view into your slice and return the Bytes slice, make sure that these Bytes pieces are not kept around for a long time.

Violating this guideline is the main way that you can end up with using a lot more memory than you actually need, because even a single small Bytes will keep the full allocation it came from alive, even if that allocation is much larger than the Bytes itself.

As long as you follow the above rule, using advance and reserve in your codec will generally do the right thing and will not result in unbounded memory growth.

Comparing to the MyStringDecoder example in the docs, this example has no risk of running into problems. That is because the example uses .to_vec() to copy the data from the BytesMut into a separate Vec<u8> (which is then converted to a String). Because of this copy, it is able to avoid giving out Bytes slices into the codec's BytesMut, so there is no risk of violating the rule no matter how the returned strings are used.