volo-rs / faststr

`faststr` is a string library that try to avoid the cost of clone.
https://crates.io/crates/faststr
Apache License 2.0
85 stars 7 forks source link

feat: inline variants by buffer lengths #17

Open CPunisher opened 1 week ago

CPunisher commented 1 week ago
  1. Make six versions of Repr::Inline, and increase INLINE_CAP to 32 This is inspired by https://github.com/rust-lang/rust/issues/119247#issuecomment-1963021456. However, I just choose six versions with buffers of fixed length 1,2,4,8,16,32 (maybe there is a better partition). The purpose is:
    1. Decrease the variants counts. Because I find that too much variants lead to extra instructions in asm (but I don't know the root cause).
    2. Copying data of these lengths requires at most 2 moves on the registers. For lengths in [17, 32], it only takes two trips from memory to the %xmm0 (16 byte) register and back to memory to copy the buffer.
  2. Inline the sizes of variants. This is inspired by https://github.com/rust-analyzer/smol_str/pull/53, which makes the compiler know the length range of each variant.

From my benchmarking, the performance improves on both x86-64 and aarch64. ~I need to figure out why cloning empty is also faster.~ The results of the benchmark look weird. I write a simple for-loop to test the performance, which shows an expected result that the performance of cloning inline string is greatly improved at the cost of some extra instructions for the discriminant judgment.

There are also cons:

  1. This makes code somehow cumbersome (I don't think it's better to use macro).
  2. This increases binary size from 141kb to 205kb on aarch64
  3. I'm not sure if there are any bugs, since the test coverage is not too high.