modularml / mojo

The Mojo Programming Language
https://docs.modular.com/mojo
Other
21.79k stars 2.53k forks source link

[Feature Request] Better handling of null terminator in strings #2678

Open gabrieldemarmiesse opened 2 weeks ago

gabrieldemarmiesse commented 2 weeks ago

Review Mojo's priorities

What is your request?

I would like a discussion to answer the following questions:

1) Is having a null terminator in String aligned with the goals of Mojo? 2) If yes, what can we do to make working with String less painful concerning the null terminator? And make String more safe?

What is your motivation for this change?

Once again, I have hit the issue of the null terminator when working with String (I was creating the String from an empty list instead of a list with one zero element). I cannot count the number of bugs we have had so far because of the null terminator.

The ones I can gather so far:

and I'm sure there are more.

Some might say "skill issue", then it seems we're all "unskilled" 😆

I don't have enough knowledge to say if it's absolutely necessary or not. But I'm sure that IF it is necessary to have the null terminator in string, we can make it a bit more fool-proof. I believe it also closely relates with the goals of Mojo about memory safety as the null terminator caused many memory bugs so far. Can we add guardrails here?

The null terminator has also an impact on permformance. As we may need to resort to clever tricks when implementing SSO to squeeze one more byte out of the inline buffer

This adds some instructions to every access to a String size, and String creation, and it also takes one more byte for every String created (when we have SSO, that's one more byte to set in the inline buffer per String).

gryznar commented 2 weeks ago

It seems to me, that representing String via pointer and size could be much more reliable and performant

soraros commented 2 weeks ago

We should clearly document the rationale for our choice, regardless of what the final decision may be.

Brian-M-J commented 1 week ago

For references to prior work, we can look at Rust and C++.

Rust treats c-style strings as a separate type: CString

C++ also recommends this in its core guidelines:

F.25: Use a zstring or a not_null<zstring> to designate a C-style string

SL.str.3: Use zstring or czstring to refer to a C-style, zero-terminated, sequence of characters

See zstring in the Guidelines Support Library for more details.

nmsmith commented 1 week ago

What can we do to make working with String less painful concerning the null terminator?

Whether or not String null-terminates its character buffer should be an implementation detail. If the implementation is correct, then people who are merely "working with String" (i.e. using it in their programs) should never encounter any issues.

I think it's fine for people who are implementing String to deal with whatever pain it brings. The data type is fundamentally unsafe, because you're working with a raw memory allocation. There's no way to fix that. (The same is true for the people who are implementing List.)

That said, I have no opinions on whether or not String is null-terminated.