maciejhirsz / beef

Faster, more compact implementation of std::borrow::Cow
https://crates.io/crates/beef
Apache License 2.0
338 stars 17 forks source link

Add const accessors for data #56

Open Caellian opened 1 week ago

Caellian commented 1 week ago

I didn't want to use as_ptr because that exists on Deref with str Target. I don't like this name, but it's very explicit and long so it won't be confused with std which doesn't use get_.

Alternatively, maybe as_raw_ptr would be good. But IMO it's way to similar to as_ptr, so I opted for get_raw_ptr because it's clear what it does and just enough imperfect/annoying to make collisions with Cowed types unlikely.

Once const traits get into stable, this could maybe be deprecated and point to str::as_ptr or addr_of!(**deref). But I feel like that won't be the case for additional 4-5 years.

maciejhirsz commented 6 days ago

@Caellian I reckon if we don't want to confuse people by overloading methods on the underlying deref types, the idiomatic way of doing it is to not make it a method at all and instead of &self take this: &Self as parameter. This is pretty common on smart pointer types in std:

The first one of these is very analogous to what's going on here, so if you don't mind if it's "incredibly convoluted" :upside_down_face: then Cow::as_ptr(&my_cow) should work well.

maciejhirsz commented 6 days ago

This looks like a solution to XY problem and I'm not really a big fan of leaking internals. If you just need a way to get a &str from Cow<str> then we should just have Cow::as_str[_const](my_cow) implemented for Cow<str, Lean> and Cow<str, Wide>.

Caellian commented 6 days ago

I don't see it as leaking internals as they don't expose something that will ever change, nor use internal types in signatures:

std Cow probably doesn't expose these methods because a large part of it was designed without const in mind, aannd once const traits land they would be just noise because those methods are always defined on cowed types as well. In case of a library they can be removed with a simple semver change.

we should just have Cow::as_str_const implemented for Cow<str, Lean> and Cow<str, Wide>

The exposed functions are building blocks for functions like as_str - this only looks like more code now that my use case is just str.
Exposed functions only need to handle additional representations which are known to beef (if they ever get added (unlikely)). Whereas as_[T][_const] can't ever be exhaustive because some FooBar defined in one of dependents isn't known to beef and whoever deals with Cow<'a, FooBarRef> types would then need to fork this crate to have const access to len and capacity.

maciejhirsz commented 6 days ago

Accessing data ptr is very useful for unsafe code.

If you have &str in const context you can use its as_ptr method to get the pointer.

Accessing len/cap makes sense to me given that Cow actually stores those, so as a container (like Vec and String) it makes sense for it to expose them.

capacity is internal, the fact that it can be 0 for non-empty strings is an implementation detail, if we ever want to change how beef differentiates between owned and borrowed values this might no longer be the case and could break usercode down the road. I also don't really know what you need actual capacity for since you can't create heap allocated (non-empty) Strings in const expressions.

Caellian commented 6 days ago

If you have &str in const context you can use its as_ptr method to get the pointer.

I know that, but I don't. I forked this crate and made a PR because I'm using Cow which doesn't have that in const context. I'm using this crate because I need a no_std compatible one.

I also don't really know what you need actual capacity for

I don't. I added it without thinking for completeness.

maciejhirsz commented 6 days ago

I know that, but I don't.

My point is that if you have a Cow<str> in const context and you can get &str out of it with a const function, you can also get a pointer out of it this way:

const PTR: *mut u8 = COW_STR.as_str().as_ptr();

Likewise you don't need the len method if you can just get it from the &str. It makes the API surface smaller and more versatile at the same time.

Don't think there are any issues if we override as_str if it actually returns a &str, there will need to be two implementations for Cow<str, Lean> and Cow<str, Wide> to make them const but I reckon that's acceptable.

Caellian commented 6 days ago

It makes the API surface smaller and more versatile at the same time.

It's not more versatile if it only works for &str type and not for other Ts:

#[repr(transparent)]
struct TagBorrowed {
    name: [u8]
}
impl beef::generic::Beef for TagBorrowed {
    // Assuming TagBorrowed isn't sealed
}

struct TagOwned {
    name: alloc::string::String
}

impl core::ops::Deref for TagOwned {
    type Target = TagBorrowed;

    fn deref(&self) -> &Self::Target {
        unsafe {
            &*(self.name.as_bytes() as *const [u8] as *const TagBorrowed)
        }
    }
}

let example: beef::Cow<TagBorrowed> = beef::Cow::owned(TagOwned {
    name: alloc::string::String::from("hui")
});

But that assumes beef::Cow can be used with types other than str and [u8].

Something like #20 would be fine if it worked in const, but I see it's using a trait. Hmmm....