oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20k stars 1.6k forks source link

Introduce TruffleString #2505

Closed fniephaus closed 2 years ago

fniephaus commented 4 years ago

The goal of this issue is to publicly collect requirements and document the design and implementation of TruffleString, a language-agnostic string representation for string-like objects within Truffle languages.

Quoting @chumer:

The current design is immutable. Buffers are a different story we won’t cover in the first design. [...] lots of Ruby requirements coming in from the Ruby team. Now TruffleString (is) in the hand of the regex team, which is naturally a cross language team.

Language Implementation Type Comments
Espresso immutable Implementation not (yet) public.
FastR mutable Uses byte[] and String within its CharSXPWrapper, which is used in RStringVecNativeData
GraalPython immutable Uses Java CharSequence as part of its PString.
Graal.js immutable Uses Java CharSequence within a DynamicObject as part of its JSString.
SOMns immutable Uses Java String as internal string representation. Has support for immutable symbols.
TruffleRuby mutable Uses ropes to provide mutability, has multiple encodings, not all encodings fully compatible with Unicode, zero-copy concatenation critical for performance of production code.
TruffleSqueak mutable Supports ByteString (byte[]) and WideString (int[]). A ByteString becomes a WideString when a value outside the byte-range (e.g. a unicode char) is put into it.

Please put this issue on the Truffle project and assign it to @chumer and @djoooooe.

Edit 1: Add Espresso. Edit 2: Incorporate @chrisseaton's Ruby comment extension (see https://github.com/oracle/graal/issues/2505#issuecomment-633982977).

smarr commented 4 years ago

For TruffleSOM and derivates (SOMns, Moth (a Grace)) we indeed expect Strings and Symbols (interned Strings) to be immutable like in Java. Rope-like append and sharing of substrings might be a nice things to have, though, I am not too sure about the tradeoffs.

fniephaus commented 4 years ago

I wonder if the ropes optimization could somehow be implemented in an optional manner, so that language implementations could decide if they want to use it or not. Implementing ropes again and again in some languages seems redundant.

Also, I'm of course interested in what TruffleString means for interop: will it replace String as the exchange representation between languages? Does my language have to create TruffleStrings on the fly for interop if it does not use it internally?

djoooooe commented 4 years ago

The current plan would be to replace String with TruffleString in interop, but offer native support for String, i.e. the actual parameter type in Java will be Object, and passing a String would still be valid.

chrisseaton commented 4 years ago
Language Implementation Type Comments
TruffleRuby mutable Uses ropes to provide mutability, has multiple encodings, not all encodings fully compatible with Unicode, zero-copy concatenation critical for performance of production code
fcurts commented 4 years ago

You might want to consider support for viewing strings as sequences of extended grapheme clusters, which is increasingly supported by modern languages (e.g., Dart, Swift).

https://medium.com/flutter-community/working-with-unicode-and-grapheme-clusters-in-dart-b054faab5705

boris-spas commented 4 years ago

Tracking internally as GR-17176.

chrisseaton commented 4 years ago

Thanks Boris. Please engage with us early and often on your designs and prototypes. My use-case is very sensitive to string performance and I can experiment on real workloads.

SchrodingerZhu commented 2 years ago

any update

fniephaus commented 2 years ago

TruffleString was merged last night via https://github.com/oracle/graal/commit/845231e651d611ecbe5cffc0535fda0d0e83bad1.