Feature request: Modify `text.regex_split_with_offsets()` behavior to be in line with `tf.strings.length()`

text.regex_split_with_offsets() currently returns begin and end as tf.int64 tensors that count indices in bytes.

tf.strings.length() on the other hand, returns a tf.int32 tensor which counts lengths in either bytes or UTF8 characters according to the value of the parameter unit.

So this would actually be two separate requests:

Change the return types of text.regex_split_with_offsets() to tf.int32, removing the need for a cast when comparing with tf.strings.length(). I doubt there will be a use case for strings longer than INT32_MAX in the foreseeable future.
Add parameter unit: Literal["BYTE", "UTF8_CHAR"] = "BYTE" matching the behavior of tf.strings.length() and tf.strings.substr(). Seeing the regular expressions are already being interpreted in 'utf-8', I think it would make sense to add a layer of abstraction to facilitate slicing by UTF-8 character index.

tensorflow / text

Feature request: Modify `text.regex_split_with_offsets()` behavior to be in line with `tf.strings.length()` #1245