In Crystal, handling multi-byte sequences in UTF-8 requires understanding that slicing by bytes can lead to invalid sequences if the slicing isn't aligned with character boundaries. In this case, attempting to slice a string by bytes can cut through multi-byte UTF-8 sequences, leading to invalid sequences.
To avoid this, strings should be sliced based on characters rather than bytes.
Explanation:
str.chars: Converts the string to an array of characters, which allows you to handle slicing based on characters rather than bytes.
chars[start_index, length]: Slices the array of characters, which is safe for UTF-8 as it ensures no multi-byte sequences are broken.
sliced_chars.join: Joins the sliced characters back into a string.
This approach ensures that the slicing respects the boundaries of UTF-8 characters and avoids invalid byte sequences.
Fixes https://github.com/iv-org/invidious/issues/4886
In Crystal, handling multi-byte sequences in UTF-8 requires understanding that slicing by bytes can lead to invalid sequences if the slicing isn't aligned with character boundaries. In this case, attempting to slice a string by bytes can cut through multi-byte UTF-8 sequences, leading to invalid sequences.
To avoid this, strings should be sliced based on characters rather than bytes.
Explanation:
This approach ensures that the slicing respects the boundaries of UTF-8 characters and avoids invalid byte sequences.