tokio-rs / bytes

Utilities for working with bytes
MIT License
1.91k stars 288 forks source link

`BytesMut::freeze` is not always zero-cost as documented #587

Closed bk2204 closed 1 year ago

bk2204 commented 1 year ago

When converting a BytesMut into Bytes via freeze, the operation can cause a re-allocation to occur because we call into_boxed_slice. This is unacceptable in some hot codepaths.

The documentation says that “[t]he conversion is zero cost.” I don't believe that an allocation can be considered zero cost.

Example code:

use bytes::BytesMut;

fn main() {
    let mut b = BytesMut::with_capacity(10);
    b.extend(b"abcdefg");
    let b = b.freeze();
    println!("{:?}", b);
}

If you compile this program and set a breakpoint on realloc, you'll see this (after the startup code):

(gdb) bt
#0  __GI___libc_realloc (oldmem=0x5555555a5ba0, bytes=7) at ./malloc/malloc.c:3394
#1  0x000055555555da2b in alloc::vec::Vec<T,A>::into_boxed_slice ()
#2  0x000055555555cc00 in <bytes::bytes::Bytes as core::convert::From<alloc::vec::Vec<u8>>>::from ()
#3  0x000055555555c762 in bytes_test::main ()
#4  0x000055555555cb03 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#5  0x000055555555cad9 in _ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17hcadd4fe5b7b3f353E.llvm.6218250527406152195 ()
#6  0x00005555555703fb in core::ops::function::impls::{impl#2}::call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> ()
    at library/core/src/ops/function.rs:286
#7  std::panicking::try::do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panicking.rs:483
#8  std::panicking::try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/std/src/panicking.rs:447
#9  std::panic::catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panic.rs:137
#10 std::rt::lang_start_internal::{closure#2} () at library/std/src/rt.rs:148
#11 std::panicking::try::do_call<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panicking.rs:483
#12 std::panicking::try<isize, std::rt::lang_start_internal::{closure_env#2}> () at library/std/src/panicking.rs:447
#13 std::panic::catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panic.rs:137
#14 std::rt::lang_start_internal () at library/std/src/rt.rs:148
#15 0x000055555555c915 in main ()

I believe in this case that's because the variant of the BytesMut is KIND_VEC and we're then recreating the Vec and doing an into, which calls into_boxed_slice. This is also the case if one substitutes BytesMut in the code for Vec, but in that case, there's no guarantee of that being zero-cost.

I'm not sure what the best way forward here is, but it would be great if someone could propose an idea where this really is zero cost. I have a server where I need to read some data into a buffer and then transfer it using a Bytes via Hyper, and since this is a very hot codepath, I really want to avoid a reallocation, since the reallocation shows up in my performance graphs.

Darksonn commented 1 year ago

I don't think the realloc is necessary in the Vec<u8> -> Bytes conversion, since a Bytes can be subsliced.

seanmonstar commented 1 year ago

The details aren't fresh in my mind, but I think the reason it does into_boxed_slice is because there was no room to store the length and capacity.

bk2204 commented 1 year ago

I think this can be pretty easily implemented by turning the Vec<u8> into a shared variant. If folks want a patch, I'm happy to send one, and I've even added a few tests.

seanmonstar commented 1 year ago

Making it shared does require a new allocation too. Would it be worth checking if the length and capacity are the same, and so into_boxed_slice won't need to shrink?

bk2204 commented 1 year ago

That's true, but it requires a much smaller allocation of a handful of bytes, which I think will be acceptable here. Reallocating my buffer of 64 KiB requires substantially more space and a much larger memcpy as well.

I can certainly see if the length and capacity are the same and use into_boxed_slice. However, in my case, it's definitely not.