tokio-rs / bytes

Utilities for working with bytes
MIT License
1.86k stars 278 forks source link

`impl FromIterator for BytesMut` exposes implementation details #723

Open aatifsyed opened 1 month ago

aatifsyed commented 1 month ago

BytesMut cares about keeping allocation implementation details private, which means no From<Vec<u8>> and no Into<Vec<u8>>: https://github.com/tokio-rs/bytes/blob/4e2c9c065a06bf9cb5d7dd46e3b29f62a1c20057/src/bytes_mut.rs#L825-L830

However, this information is leaked in the FromIterator impl: https://github.com/tokio-rs/bytes/blob/f8c7b574c0ef0c3cb097d29a08e53b15b4e4a522/src/bytes_mut.rs#L1409-L1413 which go through through this specialization: https://github.com/rust-lang/rust/blob/6c6b3027ef62e911142cfc55589baef4e9f38ec8/library/alloc/src/vec/spec_from_iter.rs#L37-L64

So collection is basically a From<Vec<u8>> with no overhead, which may not be intended by the authors.

use bytes::BytesMut; // 1.6.1

#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc; // 0.3.3

fn main() {
    let _guard = dhat::Profiler::new_heap();

    let src = vec![1; 1024];
    let mut_from_vec = measure(|| BytesMut::from_iter(src));
    assert_eq!(mut_from_vec, 0); // doesn't allocate!

    let mut_from_array = measure(|| BytesMut::from_iter([1; 1024]));
    assert_eq!(mut_from_array, 1024);
}

fn measure<T>(f: impl FnOnce() -> T) -> u64 {
    let before = dhat::HeapStats::get();
    f();
    let after = dhat::HeapStats::get();
    after.total_bytes - before.total_bytes
}

OTOH, this probably doesn't count as "an easy way", so maybe it's fine to leave it exposed

aatifsyed commented 1 month ago

The fix would be to turn off specialization like this:

let src = vec![1; 1024];
let from_vec = measure(|| BytesMut::from_iter(NoSpecialize(src)));
assert_eq!(from_vec, 1024); // this used to be 0

struct NoSpecialize<T>(T);

impl Iterator for NoSpecialize<vec::IntoIter<u8>> {
    type Item = u8;
    fn next(&mut self) -> Option<Self::Item> {
        self.0.next()
    }
    fn size_hint(&self) -> (usize, Option<usize>) {
        self.0.size_hint()
    }
}

impl IntoIterator for NoSpecialize<Vec<u8>> {
    type Item = u8;
    type IntoIter = NoSpecialize<vec::IntoIter<u8>>;
    fn into_iter(self) -> Self::IntoIter {
        NoSpecialize(self.0.into_iter())
    }
}
Darksonn commented 1 month ago

I believe the section you're quoting is only talking about BytesMut. We already have impl From<Vec<u8>> for Bytes.

aatifsyed commented 1 month ago

I believe the section you're quoting is only talking about BytesMut. We already have impl From<Vec> for Bytes.

Ah, my mistake! I've updated the issue and comments as appropriate :)

aatifsyed commented 1 month ago

If you agree that this is a bug, I'll happily open a PR

seanmonstar commented 1 month ago

What's the bug? That the compiler is able to make the copying faster? That seems like a thing we want to keep.

It doesn't actually expose in the type system the internals.

aatifsyed commented 1 month ago

Perhaps "bug" is a bit strong. BytesMut goes to moderate lengths to not allow cheap conversion from a Vec<u8> so that users cannot rely on it. But because of the interaction of the above traits, a wily user can get a "free" conversion to BytesMut.

If the allocation strategy changes in future (which the comment in from_vec seems to want to leave open), the "free" conversion becomes not free, which violates the comment.

jorgehermo9 commented 1 month ago

I don't think people expect a from_iter to be free.

They could think it from something like a Vec<u8>, as it happens with Bytes::from(vec)

You see it like its free in the benchmark most likely because the compiler optimizes that a lot, since it is a hardcoded Vec.

Try with something like this (note the std::hint::black_box):

use bytes::BytesMut; // 1.6.1

#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc; // 0.3.3

fn main() {
    let _guard = dhat::Profiler::new_heap();

    let src = vec![1; 1024];
    let mut_from_vec = std::hint::black_box(measure(|| BytesMut::from_iter(src)));
    assert_eq!(mut_from_vec, 0); // doesn't allocate!

    let mut_from_array = std::hint::black_box(measure(|| BytesMut::from_iter([1; 1024])));
    assert_eq!(mut_from_array, 1024);
}

fn measure<T>(f: impl FnOnce() -> T) -> u64 {
    let before = dhat::HeapStats::get();
    f();
    let after = dhat::HeapStats::get();
    after.total_bytes - before.total_bytes
}