stellar / rs-soroban-sdk

Rust SDK for Soroban contracts.
Apache License 2.0
118 stars 66 forks source link

String append / concat #1280

Open leighmcculloch opened 3 months ago

leighmcculloch commented 3 months ago

@willemneal mentioned today that the String type has no concatenation functionality and replicating it with the existing SDK API is pretty complex because the developer has to use an allocator.

We could explore how to add this to the SDK without a protocol change, although very likely a protocol change is required here.

This issue probably requires:

willemneal commented 3 months ago

Currently String has a copy_into_slice. So you can create a Vec as the same length as the combined strings. Then copy each into it and convert into String.

jayz22 commented 3 months ago

String can also convert to and from bytes, and there is bytes_append host function, maybe that's easier?

leighmcculloch commented 3 months ago

Ah yes, that'd be a much more preferred approach. Thanks @jayz22 !

leighmcculloch commented 1 month ago

@jayz22 How can String convert to and from Bytes? I don't see any existing SDK functionality for converting between them, and I don't see any host functions for doing so either.

jayz22 commented 1 month ago

@jayz22 How can String convert to and from Bytes? I don't see any existing SDK functionality for converting between them, and I don't see any host functions for doing so either.

Actually it would be easier since String can directly converted to/from raw bytes via the linear memory, so one way could be

str1->string_copy_to_slice()
str2->string_copy_to_slice()
string_new_from_slice(concatenated_slice) -> str3

no need to go through the Bytes.

A unit test on the host side would be:

#[test]
fn string_concat() -> Result<(), HostError> {
    let host = Host::default();
    let str1 = "abc";
    let str2 = "def";
    let str1_obj = host.string_new_from_slice(str1.as_bytes())?;
    let str2_obj = host.string_new_from_slice(str2.as_bytes())?;

    let mut concat_slice = vec![0; str1.len() + str2.len()];
    host.string_copy_to_slice(str1_obj, Val::from_u32(0), &mut concat_slice[0..str1.len()])?;
    host.string_copy_to_slice(str2_obj, Val::from_u32(0), &mut concat_slice[str1.len()..])?;
    let str3 = host.string_new_from_slice(&concat_slice)?;

    let str_ref = host.string_new_from_slice(b"abcdef")?;
    assert_eq!(host.obj_cmp(str3.to_val(), str_ref.to_val())?, 0);

    Ok(())
}
leighmcculloch commented 1 month ago

Writing the bytes into local memory requires alloc, or requires using a buffer repeatedly. The main goal of this issue is to not use alloc. Using a buffer is an option, but an expensive one I think.

jayz22 commented 1 month ago

Maybe I'm missing something, but why would copying via linear memory expensive? The linear memory routines are designed for these type of repeated/large number of bytes manipulation. The bytes_append host function also needs to copy both bytes objects into an internal slice and create a new object from it. So I think performance wise it should be equivalent (minus a couple of host function invocation costs).

leighmcculloch commented 1 month ago

@jayz22 To add an append function to the SDK it needs to pick some arbitrary buffer size, then for loop over copying the bytes to the slice on buffer at a time, then back to the host. It seems like a lot of repetitive bytes pushing across the boundary for something that could occur entirely host side. Depending on what we pick as a suitable buffer size, and depending on the input size, it could be multiple host fn calls, and it's 2 host fn calls per buffer filled.

Maybe we don't need to optimise this now, but this seems like the sort of thing host fns should do entirely host side.

jayz22 commented 1 month ago

Yeah you are right, that would require picking a buffer size or using an allocator. In that case we should add some helper host functions for this, and possibly alone with other ones being proposed https://github.com/stellar/rs-soroban-env/issues/1337 and https://github.com/stellar/rs-soroban-env/issues/1101