Open StefanBRas opened 1 year ago
I feel very much that we should build an extension package for these exotic cases. With https://github.com/pola-rs/pyo3-polars in place, we could very easily make specialization libraries that can be installed opt-in.
I have no strong feelings either way. I do however feel like the most common usage of str.to_lower
and str.to_upper
is to actually make (wrong) case insensitive comparisons, so including str.to_casefolded
is as exotic as those.
I think this is not an exotic use case as Python's standard library recommends str.casefold
for correct string comparison.
I think this would be a useful addition.
Problem description
With the current functions it is not possible to do a correct case insensitive string comparison. You can get a wrong result if you use
.str.lower
for example:Python has it build in as
str.casefold
which returns a case-folded version of the string. Pandas has it Series.str.casefold.The algorithm is specified in the Unicode Standard 3.13 (PDF link).
It seems like Rust does not have this built in and the best bet would be either the [focaccia crate] https://crates.io/crates/focaccia) which is the most recently updated one or caseless which is more used but less documented and recently updated.