Open Chaoses-Ib opened 12 months ago
I was not aware that ö
and ö
are encoded differently. They look the same.
However, even when using the built-in equal method, they are determined to be not equal.
assert_eq!("ö", "ö"); // assertion failed
Why should they be considered equal when performing a case-insensitive comparison?
Yeah, that makes sense. Doing normalization or not should depend on the use case. Chromium does normalization when searching text on the page. According to this article, Windows and Linux on ext4 don't do normalization to file names, but macOS does it, and Linux on ZFS does it based on user config.
Adding another version of functions that can do normalization may be the real workable way, like eq_norm
and eq_norm_ignore_case
.
unicase doesn't apply Unicode normalization to strings (https://github.com/seanmonstar/unicase/issues/48).
eq_ignore_case
can be wrong in some cases, for example:Unicode normalization can be done using https://github.com/unicode-org/icu4x or https://github.com/unicode-rs/unicode-normalization. However, it is a bit complex and may hurt performance. If you don't want to do it, at least adding some warnings in the documentation would be good for users.