Open lukewilliamboswell opened 3 months ago
(Maybe we don't need Num.fromStr
. I didn't realize we already had Str.toU32
, Str.toF32
, etc.)
To elaborate on how Str.dropFirstBytes
and Str.dropLastBytes
work, it's safe to slice a valid utf-8 string as long as you don't chop any characters in the middle. You can tell you're chopping a character in the middle because utf-8 bytes that are in the middle of a character versus the beginning look different. Bytes that look like 0xxxxxxx
or 11xxxxxx
always mark the beginning of a new character, and bytes that look like 10xxxxxx
are always continuing a character.
Here is some Roc pseudocode for the Str.dropFirstBytes
and Str.dropLastBytes
isSafeSplitPoint : Str, U64 -> Bool
isSafeSplitPoint = \s, index ->
when Str.toUtf8 s |> List.get index is
Ok b -> Num.toI8 b >= -64 # This is bit magic equivalent to: b < 128 || b >= 192 (Copied from Rust stdlib `is_utf8_char_boundary`)
Err OutOfBounds -> Bool.true # Splitting a string at a point past the end can't break apart two characters
dropFirstBytes : Str, U64 -> Result Str [BadUtf8]
dropFirstBytes = \s, n ->
if isSafeSplitPoint s n then
s
|> Str.toUtf8
|> List.dropFirst n
|> Str.fromUtf8Unchecked # (We don't actually have Str.fromUtf8Unchecked in Roc, but this is Roc pseudocode...)
|> Ok
else
Err BadUtf8
dropLastBytes : Str, U64 -> Result Str [BadUtf8]
dropLastBytes = \s, n ->
if isSafeSplitPoint s ((Str.countUtf8Bytes s) - n) then
s
|> Str.toUtf8
|> List.dropLast n
|> Str.fromUtf8Unchecked # (We don't actually have Str.fromUtf8Unchecked in Roc, but this is Roc pseudocode...)
|> Ok
else
Err BadUtf8
See zulip discussion for background discussion.
Builtins