roc-lang / roc

A fast, friendly, functional language.
https://roc-lang.org
Universal Permissive License v1.0
4.4k stars 309 forks source link

Small string optimization #323

Closed rtfeldman closed 2 years ago

rtfeldman commented 4 years ago

This is an optimization which Swift does, as does the C++ stdlib. Of note, Rust does not do it, and neither does Go.

The general idea: (for this rest of this, we'll assume a 64-bit target; it works on 32-bit too, but the numbers are different)

Tradeoffs

This optimization has several tradeoffs:

In Practice

Opponents of this optimization seem to say it introduces complexity and generated code bloat, and isn't worth it. Proponents (e.g. here, here, and here) suggest it is indeed worth it in practice.

This optimization is potentially a huge deal for applications that deal with lots of small strings. One example of such an application would be Roc's editor. Think of how many strings for tag names (e.g. True, False, Ok) and record field names would be small strings! Editor plugins will be written in Roc, so this optimization would directly impact cache locality and total memory usage of the editor.

Overall, it seems like most applications won't notice, and then for some applications it'll be incredibly noticeably positive. It's hard to imagine an application where it would be very noticeably negative in terms of runtime performance; the most likely negative impact would be final binary size, which Roc already intentionally sacrifices in order to improve runtime performance (see for example aggressive function inlining and monomorphization).

Implementation Notes

To work best with https://github.com/rtfeldman/roc/issues/322 we'd want to store the "is this string small?" flag in the lsbit of the pointer. On little-endian systems, this would be in the first byte, and then the following 15 bytes would be the small string's contents. (On big-endian systems, we'd want to reverse the tuple such that it would be (length, pointer) instead of (pointer, length) and then the first 15 bytes would be the string's data instead, followed by the metadata byte at the end.)

If the "this is a small string" flag is set to 1, then the entire byte surrounding that flag becomes the (bit-shifted) length of the small string. 7 bits can represent lengths of up to 128, but we already know the lengths only go up to 15 (or 31 in a hypothetical future 128-bit system), so 7 bits is plenty of room to store the length.

Since the max length of a string is the highest isize rather than usize (because pointers need to do signed arithmetic), it's also possible to store the flag as the msbit of length, instead of using the pointer. This does not interact well with https://github.com/rtfeldman/roc/issues/322 though.

I think this is what Facebook does in their C++ string library, because they do a neat trick where they store the length in the final byte of the string, and they store it as (1-length) so that it's a 0 when you're using all 15 bytes in the small string. This means you conveniently have a 15-byte string followed by a 0, which serves as a nul-terminator, and can pass a pointer to the whole 16 bytes directly to a function that wants a C string.

I don't think that design would pay off in Roc, given how infrequently Roc will be converting to C strings compared to a C++ library.

Swift apparently uses some of the spare bits to store additional flags for string modes other than "small string" and "large string." We could do this too, but the cost would be further code bloat, so we shouldn't do it without good reason.

rtfeldman commented 2 years ago

We do this now!