thomcc / arcstr

Better reference counted strings for Rust
Apache License 2.0
114 stars 18 forks source link

Feature Request: construction in-place #36

Closed gzz2000 closed 5 months ago

gzz2000 commented 2 years ago

It would be very helpful if a method to construct an ArcStr in-place can be provided, instead of forcing a wasteful copy from String. This is especially useful if we want to read data from a File. There might be some difficulty regarding utf-8 checks. But I think we can mark it unsafe. The method might looks like:

impl ArcStr {
    unsafe fn from_len(len: usize, construct: impl FnOnce(&mut [u8])) -> ArcStr { ... }
}

Usage:

let mut f = File::open("foo.txt")?;
let len = f.stream_len()?;
let s = unsafe { ArcStr::from_len(len, |buf| { f.read(&mut buf); }) }
thomcc commented 2 years ago

Hm, that seems reasonable, and not too difficult to add

I'd accept a patch for it, or I can try and get to it next weekend probably.

thomcc commented 2 years ago

Actually, I think I'd like to spend a little time thinking about the interface specifically, so don't submit a patch unless you're willing to work through that. I'll try and get to this next weekend though (this weekend I'm busy).

This is a good idea though, thanks.

gzz2000 commented 2 years ago

Actually, I think I'd like to spend a little time thinking about the interface specifically, so don't submit a patch unless you're willing to work through that. I'll try and get to this next weekend though (this weekend I'm busy).

This is a good idea though, thanks.

Thank you! Unfortunately I don't know if I understand the internal data structures correctly so I may make mistakes. I am using the old copying method at this moment and can switch to a new implementation when ready.

aminya commented 10 months ago

I am also interested in this feature. I tried to use a Vector to implement this, but it turned out to be harder than I thought. Here is my attempt:

impl ArcStr {

    /// Attempt to create a string using the `construct` function that will be called
    /// with a `&mut Vec<u8>` that will write the string data into the vector.
    /// This is useful to avoid copying the string data.
    ///
    /// # Examples
    /// ```
    /// # use arcstr::ArcStr;
    /// let s = ArcStr::try_build(|v| v.extend_from_slice(b"abc")).unwrap();
    /// assert_eq!(s, "abc");
    /// ```
    pub fn try_build<F>(construct: F) -> Option<Self>
    where
        F: FnOnce(&mut Vec<u8>),
    {
        if let Ok(inner) = ThinInner::try_build(construct) {
            Some(Self(inner))
        } else {
            None
        }
    }
}

impl ThinInner {

    /// Construct with a custom function that can write to a `Vec<u8>` buffer
    fn try_build<F>(construct: F) -> Result<NonNull<Self>, Option<Layout>>
    where
        F: FnOnce(&mut Vec<u8>),
    {
        // Create a Vec<u8> with an initial capacity
        let mut buffer = Vec::new();

        // Pass the mutable Vec<u8> to the provided function `construct` which will write the data into the Vec<u8>
        construct(&mut buffer);

        // Convert the Vec<u8> into a raw pointer and length
        let ptr = buffer.as_mut_ptr();
        let cap = buffer.capacity();

        // Forget the Vec<u8> so it doesn't get deallocated when it goes out of scope
        forget(buffer);

        let out = Self::try_allocate_uninit(cap)?;

        unsafe {
            // Move the data from the Vec<u8> into the allocation without copying
            let data_ptr = out.as_ptr().add(OFFSET_DATA) as *mut u8;
            data_ptr.swap(ptr);
        }

        Ok(out)
    }
}
thomcc commented 5 months ago

I thought about this some. It would have to take &mut [MaybeUninit<u8>] (not &mut [u8]) in order to actually avoid initializing things twice. I'll add that as a low-level primitive.

I am also interested in this feature. I tried to use a Vector to implement this, but it turned out to be harder than I thought

Yes, I don't think this is the right approach to implementing this. I'm not surprised you found it difficult.