rusticstuff / simdutf8

SIMD-accelerated UTF-8 validation for Rust.
Other
531 stars 26 forks source link

Replacement for `String::from_utf8` #73

Open Nugine opened 1 year ago

Nugine commented 1 year ago

Currently there is no safe relpacement for String::from_utf8 in simdutf8. I think it is easy to add a function for this.

dralley commented 1 year ago

That would be effectively the same as simdutf8::compat::from_utf8(value).and_then(|s| s.to_owned()), yes?

Note that there was some discussion in the past about putting it in the standard library directly: https://www.reddit.com/r/rust/comments/mvc6o5/incredibly_fast_utf8_validation/

Nugine commented 1 year ago

Thanks for the answer!

Nugine commented 1 year ago

Ah I forgot the original problem. String::from_utf8 converts Vec<u8> to String with validation. However, simdutf8 can check a slice but not a vec. You have to use String::from_utf8_unchecked to bypass an extra copy. So there's still no safe replacement for that.

Vrtgs commented 1 year ago

Looking into the implementation of from_utf8 this should be quite easy to add

#[inline]
pub fn from_utf8(input: &[u8]) -> Result<&str, Utf8Error> {
    unsafe {
        validate_utf8_basic(input)?;
        Ok(from_utf8_unchecked(input))
    }
}

and we just add

pub mod string {
    pub use super::*;
    #[inline]
    pub fn from_utf8(input: Vec<u8>) -> Result<String, Utf8Error> {
        unsafe {
            validate_utf8_basic(&input)?;
            Ok(String::from_utf8_unchecked(input))
        }
    }
}