Closed fniephaus closed 2 years ago
For TruffleSOM and derivates (SOMns, Moth (a Grace)) we indeed expect Strings and Symbols (interned Strings) to be immutable like in Java. Rope-like append and sharing of substrings might be a nice things to have, though, I am not too sure about the tradeoffs.
I wonder if the ropes optimization could somehow be implemented in an optional manner, so that language implementations could decide if they want to use it or not. Implementing ropes again and again in some languages seems redundant.
Also, I'm of course interested in what TruffleString
means for interop: will it replace String
as the exchange representation between languages? Does my language have to create TruffleString
s on the fly for interop if it does not use it internally?
The current plan would be to replace String
with TruffleString
in interop, but offer native support for String
, i.e. the actual parameter type in Java will be Object
, and passing a String
would still be valid.
Language Implementation | Type | Comments |
---|---|---|
TruffleRuby | mutable | Uses ropes to provide mutability, has multiple encodings, not all encodings fully compatible with Unicode, zero-copy concatenation critical for performance of production code |
You might want to consider support for viewing strings as sequences of extended grapheme clusters, which is increasingly supported by modern languages (e.g., Dart, Swift).
https://medium.com/flutter-community/working-with-unicode-and-grapheme-clusters-in-dart-b054faab5705
Tracking internally as GR-17176.
Thanks Boris. Please engage with us early and often on your designs and prototypes. My use-case is very sensitive to string performance and I can experiment on real workloads.
any update
TruffleString was merged last night via https://github.com/oracle/graal/commit/845231e651d611ecbe5cffc0535fda0d0e83bad1.
The goal of this issue is to publicly collect requirements and document the design and implementation of
TruffleString
, a language-agnostic string representation for string-like objects within Truffle languages.Quoting @chumer:
byte[]
andString
within itsCharSXPWrapper
, which is used inRStringVecNativeData
CharSequence
as part of itsPString
.CharSequence
within aDynamicObject
as part of itsJSString
.String
as internal string representation. Has support for immutable symbols.ByteString
(byte[]
) andWideString
(int[]
). AByteString
becomes aWideString
when a value outside the byte-range (e.g. a unicode char) is put into it.Please put this issue on the Truffle project and assign it to @chumer and @djoooooe.
Edit 1: Add Espresso. Edit 2: Incorporate @chrisseaton's Ruby comment extension (see https://github.com/oracle/graal/issues/2505#issuecomment-633982977).