marshallpierce / rust-base64

base64, in rust
Apache License 2.0
615 stars 115 forks source link

FR: Provide decode_inplace function #190

Open mina86 opened 2 years ago

mina86 commented 2 years ago

Would be nice to have an in-place decode function, i.e.:

fn decode_inplace(data: &mut [u8]) -> Result<usize, DecodeError> { … }

Since output is never longer than input, the decoded value can be saved directly into the input buffer thus possibly saving on having to allocate a new output buffer, e.g.:

fn b64decode(value: String) -> Result<Vec<u8>, base64::DecodeError> {
    let mut vec = value.into_bytes();
    let new_length = base64::decode_inplace(&mut vec[..])?;
    vec.truncate(new_length);
    Ok(vec)
}

This would of course come with a caveat that if error is encountered the data in the buffer is in unspecified state (portion of it could have been overwritten).

Nugine commented 2 years ago

https://docs.rs/base64-simd/latest/base64_simd/struct.Base64.html#method.decode_inplace

fn b64decode(value: String) -> Result<Vec<u8>, base64_simd::Error> {
    let base64 = base64_simd::Base64::STANDARD;
    let mut vec = value.into_bytes();
    let new_length = base64.decode_inplace(&mut vec)?.len();
    vec.truncate(new_length);
    Ok(vec)
}
marshallpierce commented 1 year ago

I can see the appeal in general of not requiring more memory footprint than necessary. Do you have a specific use case in mind? Does this functionality need to extend to anything beyond a function that decodes in place in order to be useful?

mina86 commented 1 year ago

There are two cases:

  1. I have a String which I want to decode into Vec. With decode_inline I can do it with zero allocations reusing String’s internal vector.
  2. I have an encoded data in a String which (after decoding) is further deserialised into an actual object I want. Again, with decode_inline I can decode it reusing String’s Vec and then deserialise bytes from there. Though to be honest I cannot find an example of that now and I’m not 100% sure this is still an issue for me.

Function decoding in place would be sufficient.

By the way, I realised that there is one complication. If I understand base58 correctly, decoding happens from the end so the first byte that is known is the last byte in the buffer. This means that decode_inplace would store the data at the end of the buffer. This would complicate the first use case and require the use of Vec::copy_within. Still, that’s probably better than having to allocate a new Vector.

marshallpierce commented 1 year ago

Not sure about base58, but base64 decodes in front to back order in groups of 4 tokens, so that problem at least does not apply.

mina86 commented 1 year ago

Ah, sorry, I was reading this Issue thinking this was bs58 repository. Indeed, with base64 the last paragraph is not relevant.