rust-lang / wg-allocators

Home of the Allocators working group: Paving a path for a standard set of allocator traits to be used in collections!
http://bit.ly/hello-wg-allocators
203 stars 9 forks source link

Question: Can omit the <A: Allocator> generic param for the immutable references? #110

Open wada314 opened 1 year ago

wada314 commented 1 year ago

Hello, šŸ™‡ā€ā™‚

I was creating my own data structure using the generic param, say it's something like this:

pub struct MyCollection<A: Allocator> {
  some_data: Box<i32, A>,
  another_data: Vec<u32, A>,
  alloc: A,
}

and want to pass around this data type ref in my code:

pub fn use_my_collection_and_do_something<A: Allocator>(collection: &MyCollection<A>) {
  // do something...
}

Apparently, in this function the <A: Allocator> param is essentially redundant because we can assume the immutable methods does not use the allocator. This problem is nicely avoided in the basic std data types like Box<T, A>, Vec<T, A> and String<A> because those types have corresponding immutable types which omits <A> param: &T, &[T] and str. But for the other types planned in issue #7 , like BTreeMap<K, V, A>, we have the same problem. The all users of the BTreeMap need to know about the allocator even if it's just using the BTreeMap instance in immutable way.

Of course, one of the solution for this is to define a custom trait. But then I need to re-define the all immutable methods in that trait. It's great if I can have something "official" or de facto standard crate for that, or maybe can use an implementation technique to extract out the immutable-reference type like str from the owned type. WDYT?

thanks,

Amanieu commented 1 year ago

You can define a MyCollection::data method that returns a view into MyCollection with a lifetime an no allocator parameter:

struct CollectionData<'a> {
    some_data: &'a i32,
    another_data: &'a [u32],
}

This is similar to passing &[u32] around instead of &Vec<u32, A>.

wada314 commented 1 year ago

Thanks for the suggestion! šŸ™

Yes that's one of the very possible solutions, which has some positive properties compared to the solution defining a new trait:

though it has some downsides:

If there's any other solutions, please let me know, thanks!

sollyucko commented 1 year ago

To avoid writing duplicate code, you could create a sealed trait with functions that return references to the field values and default functions that implement the rest of the behavior.

burdges commented 1 year ago

You could only define &self methods on CollectionData but then give the dereferencing method a short name.

Is an allocator that always panics sensible?

pub struct IncompatibleWithInteriorMutability;

unsafe impl Allocator for IncompatibleWithInteriorMutability {
    fn allocate(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> { panic!() }
    unsafe fn deallocate(&self, ptr: NonNull<u8>, layout: Layout) { panic!() }
}

impl<A: Allocator> Deref for MyCollection<A> {
    type Target = MyCollection<IncompatibleWithInteriorMutability>;
    fn deref(&self) -> &MyCollection<IncompatibleWithInteriorMutability> {
        unsafe { core::mem::transmute(self) }
    }
}
sollyucko commented 1 year ago

@burdges

pub struct IncompatibleWithInteriorMutability;

impl<A: Allocator> Deref for MyCollection<A> {
    type Target = MyCollection<IncompatibleWithInteriorMutability>;
    fn deref(&self) -> &MyCollection<IncompatibleWithInteriorMutability> {
        unsafe { core::mem::transmute(self) }
    }
}

I'm not sure what the safety implications of transmuting between references of types are; wrapping the original allocator should at least make the layouts the same:

#[repr(transparent)]
pub struct IncompatibleWithInteriorMutability<A>(A);
yanchith commented 1 year ago

@wada314

It means I need to implement the immutable methods twice for the both MyCollection and CollectionData structs.

I had to deal with this too. The way I usually write these is have a plain, module-private function that receives all the params from either the collection or the view, and does the stuff. The associated functions on the collection and view types then call into this plain function.

It doesn't look very pretty inside the module with your collection, but the outside looks rather clean.

yanchith commented 1 year ago

The original question deals with shared references to a collection, but I tend to run into this with unique references a lot, too.

I wish there was a way to write underscore as a type parameter in the function parameters, when I still want the function to be generic over the allocator, but don't want to interact with it directly:

fn do_stuff(data: &mut MyCollection<u32, _>) {}

(Yes, I know this would be a huge language change, and is mostly a pipedream)

How are other people dealing with the additional type parameters?

burdges commented 1 year ago

Yeah, we've no idea if rustc places A first of course, so what I wrote above is clearly wrong, and adding a type parameter does nothing.

Allocator appears object safe, so if you allocate rarely then this should work as the cost of 32 bytes:

pub struct MyCollection {
  some_data: Box<i32, &'static dyn Allocator>,
  another_data: Vec<u32, &'static dyn Allocator>,
}

An _ tells rustc to infer a type, not to be polymorphic. I guess &mut MyCollection<u32,impl Allocator> works.

zakarumych commented 1 year ago

What's stopping you from defining a type that borrows from your collection type and doesn't use an allocator?

yanchith commented 1 year ago

@burdges Yeah, I used _ as a strawman syntax. I meant it more like "rustc, I don't care about what this type is precisely, just please generate the monorphized function" than "rustc, please infer this type for me".

Otherwise, I agree impl Allocator would be closest to the semantics I'd want: a concrete type, constrained by a trait, that I can't name or use, but the function would still be polymorphic over. Wait... does that work today? I admit I've been ignoring the existence of impl Trait in argument position, but this would change my mind.

Your other suggestion with &'static dyn Allocator is more or less what I'm doing at the moment to work around the issue. While the semantics and tradeoffs are different, this difference is subtle enough that it doesn't visible manifest itself.

burdges commented 1 year ago

Yes I'd think impl Allocator works fine. In principle impl _ could mean infer the trait from the A: Allocator in pub struct MyCollection<A: Allocator> { .. } but doing that sounds like a huge can of worms.

It's maybe best if proc macro crates explore some view types tooling. Also https://internals.rust-lang.org/t/blog-post-view-types-for-rust/15556

wada314 commented 1 year ago

thanks everyone for the active responses!

To avoid writing duplicate code, you could create a sealed trait with functions that return references to the field values and default functions that implement the rest of the behavior.

True šŸ‘, but I'm still not satisfying with this solution because I cannot stop comparing this solution with the simple Vec<T> ā‡” &[T] pairs and finding some downsides:

  • The total amount of the duplicated code will be far less, but I still need to write the trait methods returning the reference to the each field twice.
  • And because we are using trait, the generated code for the methods using &impl Trait will be generated twice.
  • The user needs to use not only the types but the trait too. Some methods of the MyCollection are impled for the struct itself but some are impled via trait, which sounds little weird to me...
wada314 commented 1 year ago

What's stopping you from defining a type that borrows from your collection type and doesn't use an allocator?

Assuming you're asking to me about why I'm not using the @Amanieu 's solution:

For my case, in the example I described has only 2 fields, but in my real case the number of the fields can vary from 1 to possibly over 100. So if I use that method, then the overhead coming from the ref struct size and the ref struct construct time may be unignorable.

wada314 commented 1 year ago

One idea I'm testing now is introducing a collection type using the allocator but not storing the allocator. For example, for the replacement for the Vec,

// Note `Vec::into_raw_parts_with_alloc()` returns `(*mut T, usize, usize, Alloc)`
pub struct NoAllocVec<T>(*mut T, usize, usize);

impl<T> NoAllocVec<T> {
  // This method is needed to impl `Deref<Target=[T]>`
  pub fn as_slice(&self) -> &[T] {
    unsafe { std::slice::from_raw_parts(self.0, self.1) }
  }

  // Generate something like `std::cell::RefMut<'_, Vec<T, A>>`.
  // This method is unsafe because a different allocator with the one used when constructing
  // this struct might be passed here, and we cannot tell that because we erased the allocator type
  // information intentionally.
  pub unsafe fn in_allocator<A>(&self, alloc: A) -> RefMutVec<T> {
    //...
  }
}
impl<T> Deref for NoAllocVec<T> {
  type Target = [T];
  // ...
}

Then, I can write my larger MyCollection like this:

pub struct MyCollection<A: ?Sized + Allocator> {
  another_data1: NoAllocVec<u32>,
  another_data2: NoAllocVec<u32>,
  another_data3: NoAllocVec<u32>,
  another_data4: NoAllocVec<u32>,
  alloc: A,
}

One of a benefits of writing like this is that I can remove the allocator instances from the field types. This can reduce the MyCollection struct total size. Another benefit is that I can use unsized coercions in this case. MyCollection<A: Allocator> can be safely casted into MyCollection<dyn Allocator>, where it is not depending to the individual allocator types. If I don't want to use the dynamic dispatching for the Allocator, then I can just implement immutable methods which are not using the allocator for MyCollection<dyn Allocator> type and implement the mutable methods into MyCollection<A: Allocator>, and then implement Deref that the latter returns the former, then it works as very similar to the Vec<T> and &[u8] pair.

burdges commented 1 year ago

Instead of impl _ inferring traits, maybe use std::alloc::Allocator as A; and then impl A.

If your doing crates for general purpose data structures, then you should maybe emulate std by passing the allocators. At higher levels, there are many crates which'll always want the same allocator throughout, so you could work out some allocator registration solution. Rust favors thread local tooling, but you'd want really global.

#[cfg(not(dyn_alloc))]
pub type MyAlloc = std::alloc::Global;

#[cfg(dyn_alloc)]
pub type MyAlloc;

#[cfg(dyn_alloc)]
mod alloc {

static mut ALLOC: &'static dyn Allocator = &std::alloc::Global as &dyn Allocator.;
static mut ALLOC_USED: AtomicBool = AtomicBool::new(false);
static mut ALLOC_DEAD: AtomicBool = AtomicBool::new(false);

/// Sets ALLOC.  If ALLOC_USED==true then sets ALLOC_DEAD=true and panics.
pub fn init_allocator(&'static dyn Allocator) { .. }

/// Sets ALLOC_USED=true and returns ALLOC, but panics if ALLOC_DEAD==true
fn use_allocator() -> &'static dyn Allocator { ... }

unsafe impl Allocator for MyAlloc {
    fn allocate(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> {
        use_allocator().allocate(layout)
    }
    unsafe fn deallocate(&self, ptr: NonNull<u8>, layout: Layout) {
        use_allocator().deallocate(ptr)
    }
}

} // mod details