Used to detect non-utf8 strings in r_obj_encode_utf8() and optimise the default case of no encoding conversion needed:
Avoid cloning a STRSXP if no translation is needed.
Avoid allocating a CHARSXP if no translation is needed (in case of mixed encodings in the vector, less important I guess?).
We could just call Encoding(x) <- "UTF8" from R. This will unconditionally clone the input vector unless it doesn't have any references. To preserve our optimisation, either of these would work:
Add a predicate to the C API to determine if a CHARSXP is encoded in UTF8
Improve do_setencoding() to only duplicate if needed so that it's a noop in the common case. Bonus points if exported on C side, e.g. as Rf_EnsureUtf8()?
My sense is that these new unconditional allocs would be bad for performance in dplyr. @DavisVaughan you added these optimisations for vctrs, could you confirm please?
Part of https://github.com/r-lib/rlang/issues/1706
LEVELS()
was added in https://github.com/r-lib/rlang/pull/1187Used to detect non-utf8 strings in
r_obj_encode_utf8()
and optimise the default case of no encoding conversion needed:We could just call
Encoding(x) <- "UTF8"
from R. This will unconditionally clone the input vector unless it doesn't have any references. To preserve our optimisation, either of these would work:Add a predicate to the C API to determine if a
CHARSXP
is encoded in UTF8Improve
do_setencoding()
to only duplicate if needed so that it's a noop in the common case. Bonus points if exported on C side, e.g. asRf_EnsureUtf8()
?My sense is that these new unconditional allocs would be bad for performance in dplyr. @DavisVaughan you added these optimisations for vctrs, could you confirm please?