Open gabrieldemarmiesse opened 6 months ago
It seems to me, that representing String via pointer and size could be much more reliable and performant
We should clearly document the rationale for our choice, regardless of what the final decision may be.
For references to prior work, we can look at Rust and C++.
Rust treats c-style strings as a separate type: CString
C++ also recommends this in its core guidelines:
F.25: Use a zstring
or a not_null<zstring>
to designate a C-style string
SL.str.3: Use zstring
or czstring
to refer to a C-style, zero-terminated, sequence of characters
See zstring in the Guidelines Support Library for more details.
What can we do to make working with String less painful concerning the null terminator?
Whether or not String
null-terminates its character buffer should be an implementation detail. If the implementation is correct, then people who are merely "working with String" (i.e. using it in their programs) should never encounter any issues.
I think it's fine for people who are implementing String
to deal with whatever pain it brings. The data type is fundamentally unsafe, because you're working with a raw memory allocation. There's no way to fix that. (The same is true for the people who are implementing List
.)
That said, I have no opinions on whether or not String
is null-terminated.
Review Mojo's priorities
What is your request?
I would like a discussion to answer the following questions:
1) Is having a null terminator in
String
aligned with the goals of Mojo? 2) If yes, what can we do to make working withString
less painful concerning the null terminator? And makeString
more safe?What is your motivation for this change?
Once again, I have hit the issue of the null terminator when working with String (I was creating the String from an empty list instead of a list with one zero element). I cannot count the number of bugs we have had so far because of the null terminator.
The ones I can gather so far:
and I'm sure there are more.
Some might say "skill issue", then it seems we're all "unskilled" 😆
I don't have enough knowledge to say if it's absolutely necessary or not. But I'm sure that IF it is necessary to have the null terminator in string, we can make it a bit more fool-proof. I believe it also closely relates with the goals of Mojo about memory safety as the null terminator caused many memory bugs so far. Can we add guardrails here?
The null terminator has also an impact on permformance. As we may need to resort to clever tricks when implementing SSO to squeeze one more byte out of the inline buffer
This adds some instructions to every access to a String size, and String creation, and it also takes one more byte for every String created (when we have SSO, that's one more byte to set in the inline buffer per String).