Open krtab opened 7 months ago
If everything is done in place, then the methods really only need to take &mut self
instead of transferring ownership. Otherwise I really like this proposal.
I was a bit unclear: the idea is that you still take a String
to handle the case where you have to allocate more room, but that that case should be uncommon.
Another option though would be to actually work in place and leave characters whose case cannot be changed as is or replace them with a default character. This can make sense in the use case situation where you are changing case to compare to a default value (which is a common use case, even though the proper thing to do would be normalization) for example s.to_uppercase() == "TRUE"
I was a bit unclear: the idea is that you still take a
String
to handle the case where you have to allocate more room, but that that case should be uncommon.
You can still use &mut self
with that uncommon case -- just create the new string and replace the old when done.
My bad, I was too focused on &mut str
and didn't correctly understood you guys were talking about &mut String
, you are right.
String::make_othercase(&mut self)
would nicely complement str::make_ascii_othercase(&mut self)
.
Proposal
Problem statement
As of today, changing the case of a
String
is done by using methods defined onstr
, which implies that a new buffer is allocated each time.Yet, in most use cases, upper/lower-casing of UTF-8 strings can be done in place. Indeed, the set of codepoints which needs strictly more or less bytes to encode when their case is changed is small, and even more so when considering commonly used languages (cf https://raw.githubusercontent.com/krtab/utf8caseinplace/master/stats/out.txt).
Hence, this APC proposes that a new API is added to String to change cases and do so efficiently by consuming
self
and reusing the buffer, not allocating in most cases.Motivating examples or use cases
Case changes that can consume the
String
they are working on include:to_string().to_uppercase()
: https://github.com/search?q=lang%3ARust+to_string%28%29.to_uppercase%28%29&type=code (also sometimes found asformat(...).to_uppercase()
https://github.com/facebook/hhvm/blob/9004eeeb255e06f2459ea3b5c40e1dc558f3b136/hphp/hack/src/hackc/print_expr/print.rs#L318-L324)env::var("FOO")?.to_uppercase() == "SOMETHING"
(https://github.com/ddisqq/qiskit-terra/blob/3095955244ace26ed38d1af25fe5f0033246bfd7/src/lib.rs#L31-L41)And likely other I couldn't search for.
Solution sketch
This would add the following methods (names to be determined)
The exact implementation remains to be discussed, but the idea would be that in cases where it is possible, the case change is done in place. Once that isn't possible, a auxiliary DE-queue can be used to store bytes temporarily.
Alternatives
This could be done as a crate but people would use it much less and resort to
str::to_othercase
instead. Moreover, availability on crates.io of an up-to-date version of the Unicode database allowing correct case change for all situations is not guaranteed, andcore::unicode
methods used instr::to_othercase
are not public. Finally, having both implementations in std help keep them in sync an iso-behaved.Links and related work
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution: