Open gmlewis opened 6 months ago
Anyone who is designing a new programming language should watch this:
https://www.youtube.com/watch?v=Ri2NMnSQo4o
and ideally avoid UTF-16 entirely.
Rust and Go both use UTF-8 for a good reason.
If UTF-16 can't be avoided, then maybe a new StringUtf8
type would be nice to have so that users can avoid String
as much as possible and only use the UTF-8 variant.
It would be beneficial to include an API that facilitates encoding strings in UTF-8 and writing them to a Buffer, essentially adding Buffer::write_string_utf8
.
This enhancement should be straightforward to implement leveraging the @string.String::as_iter
function, which returns an Iter[Char]
. However, it appears that Buffer
is defined within the builtin
package, which currently limits its use of @string.String::as_iter
.
In the meantime, @peter-jerry-ye pointed me to encoder
and decoder
here: https://github.com/peter-jerry-ye/jstream
Thank you, @peter-jerry-ye !
Anyone who is designing a new programming language should watch this: https://www.youtube.com/watch?v=Ri2NMnSQo4o and ideally avoid UTF-16 entirely. Rust and Go both use UTF-8 for a good reason. If UTF-16 can't be avoided, then maybe a new type would be nice to have so that users can avoid as much as possible and only use the UTF-8 variant.
StringUtf8``String
I think supporting UTF-16 is necessary. In fact, MoonBit was originally UTF-8, but later switched to UTF-16. Because we have two important backends(Wasm/JavaScript), and Wasm's String proposal (including its integration with JavaScript before this experience) and JavaScript's String are both based on UTF-16, which is why we use UTF-16.
But on the other hand, I fully support the conversion method of UTF8 <=> UTF16.
Currently there is no easy way (that I can find) to convert back and forth between UTF-8 and UTF-16 -encoded strings.
I'm doing this as a workaround: https://github.com/gmlewis/moonbit-pdk/blob/master/pdk/string.mbt and this: https://github.com/gmlewis/moonbit-pdk/blob/a9777b8b71ff1cf2e77a3cdf95244197d24343fd/pdk/host.mbt#L20-L30 but would like to replace these with standard library calls.