Closed alexcrichton closed 8 years ago
cc @SimonSapin
Minor correction: Rc::make_unique is another example of this naming. Unstable of course. On Aug 13, 2015 10:09 AM, "Alex Crichton" notifications@github.com wrote:
cc @SimonSapin https://github.com/SimonSapin
— Reply to this email directly or view it on GitHub https://github.com/rust-lang/rust/issues/27809#issuecomment-130763329.
There should be a way to do this operation without allocating. I don’t have a strong opinion on in place with &mut str
v.s. consuming and returning String
(as in the now-deprecated OwnedAsciiExt
trait), except that the former is sometimes slightly less convenient. (You need three statements instead of one expression.)
I don't like the &mut
approach, but that might be because I'm not yet used to it.
I would prefer consuming the given String too.
The &mut str
approach has the advantage that it can be used to convert an arbitrary subslice within a String
. If the consuming String
approach is adopted instead, the ability to do that would be lost.
You can always iterate the &mut u8
bytes in a string and map them to upper case one by one (which is what the &mut str
impl does). (Although yes, doing so is unsafe
and it’s nice to have a safe wrapper for it.)
As for naming, something like to_ascii_uppercase_in_place
might be better. It clearly states the purpose of the function, and it also has a bonus point that this function will be sorted around the more canonical to_ascii_uppercase
function. The drawback is that the name is much longer, but I think it won't be much a problem as the usage of this operation might be much rarer.
But I also think make_
makes sense if we go the route of consuming String
and returning String
. In that case, we would lose a safe wrapper on slices, but that's for another decision.
+1 for *_in_place
if we keep &mut self
mutating method (as opposed to self
consuming methods).
If the libs team has the bandwidth, I’d like to nominate this for discussion. To sum up, options are:
into_ascii_*case(self) -> Self
. This may need to be a separate trait.Both are equally general: at worst if all you have is a &mut String
, you can do the mem::replace
dance to use methods that consume and return an owned String
.
Also, it’s possible to build this out of tree:
for byte in &mut (bytes: Vec<u8>) {
*byte = byte.to_ascii_lowercase()
}
for byte in unsafe { (s: String).as_mut_vec() } {
*byte = byte.to_ascii_lowercase()
}
… though it would be nice to encapsulate the unsafety in the String
case.
:bell: This issue is now entering its cycle final comment period to be deprecated in 1.8 :bell:
What’s the rationale for not stabilizing any of the two proposed designs?
Ah yes, to clarify we concluded that this stuck out enough and could be easily enough built on crates.io (e.g. externally) that it wasn't necessary to stabilize in libstd at this time.
could be easily enough built on crates.io
The same can be said of the entire std::ascii
module (and other things in std
). I don’t understand having some of the related functionality in, but leave these couple methods out (which for the str
case require unsafe
.)
Sure, but that part's already stable. The make_
convention sticks out (at least to me) and I would prefer to not stabilize this.
I’m not a big fan of make_*
either.
What about .into_ascii_lowercase(self) -> Self
?
If those methods could be merged into the AsciiExt
trait itself I'd be more comfortable with that, but previously they were a separate trait.
Alright. IIRC we had a separate trait because dynamically-sized types and methods taking self
by value don’t play nice with each other. But it turns out Iterator
already does this: the trick is having a default implementation with a where Self: Sized
bound.
What do into_ascii_{upper,lower}case
offer that make_ascii_{upper,lower}case
don't?
Both are equally general: at worst if all you have is a &mut String, you can do the mem::replace dance to use methods that consume and return an owned String.
Is that true? If so how do you handle &mut str
and &mut [u8]
without two more implementations of AsciiExt
for them?
If naming is still an issue with make_*
how about just uppercase_ascii()
and lowercase_ascii()
? "uppercase" and "lowercase" are also verbs so this fits the current naming scheme.
Yeah that’s true, you can’t mem::replace
a &mut str
like you can a &mut String
, I wasn’t thinking of that case when I wrote that.
uppercase_ascii()
without a prefix is not informative enough to say how it’s different from to_uppercase_ascii()
IMO.
Well it has a different signature so maybe that's enough difference.
Anyway it looks like the choice is between:
make_ascii_uppercase() / to_ascii_uppercsase_in_place() / uppercase_ascii()
)into_ascii_uppercase()
)Personally I think good semantics are more important than a good name. Also the other String
manipulation methods (push_str(), pop(), insert(), clear()
etc...) all work in place so I think we should stay consistent.
+ to &mut str
whatever the name is, avoiding it seems like some irrational fear.
Bad semantics
Is it really? In every practical use I’ve seen consuming self
works nicely while working on &mut
requires introducing a temporary variable and have three statements instead of one expression.
Well it's really bad if you have a &mut String
and of course useless if you have a &mut str
. Why do other String
and Vec
mutation methods work in place rather than consuming self
? What makes these methods different?
There's also the case of stack-allocated buffers to consider, especially in libraries that don't want to allocate at all.
@ollie27 If you have Oh Simon mentioned that.&mut String
you can always make a little dance with mem::replace
and call the self-consuming function.
As far as I can tell the only real reason not to stabilise these methods is the name so here are some random possibilities:
make_ascii_uppercase()
make_uppercase_ascii()
to_ascii_uppercase_in_place()
uppercase_ascii()
uppercase_as_ascii()
in_place_uppercase_ascii()
make_ascii_uppercase_in_place()
uppercase_ascii_in_place()
ascii_uppercase_in_place()
etc...
I vote for uppercase_ascii()
because self mutating methods seem to be the default so I don't think we need a make_
prefix or an _in_place
suffix, it is short and it won't be confused with to_ascii_uppercase()
because it has a very different signature.
I think however this may just be a case of picking the least bad name and going with it.
What do other people think?
The libs team discussed this during triage yesterday and we reached a few conclusions:
into_
variants proposed by @SimonSapin, they fit in nicely with conventions and the module itself.uppercase_ascii
as a portion of the method name because it would be odd for this to be the only method with the phrase backwards (where all the others have ascii_uppercase
)For now we're going to move this out of FCP. Adding the into_
methods is a good interim strategy, and otherwise we can continue to debate the naming of the mutable variants.
to_ascii_uppercase_in_place
sounds the best choice. Then we can introduce the naming convention to_*_in_place
for the fn to_*_in_place(&mut self);
signature.
Usages of the to_
prefix and _in_place
suffix in std
:
fn to_le(self) -> i32
fn to_digit(self, radix: u32) -> Option<u32>
fn to_vec(&self) -> Vec<T>
fn to_mut(&mut self) -> &mut B::Owned
fn drop_in_place<T: ?Sized>(to_drop: *mut T)
fn double_in_place(&mut self) -> bool
fn reserve_in_place(&mut self, used_cap: usize, needed_extra_cap: usize) -> bool
@photino yeah that was kinda what we were leaning towards as well, the downside being that it's a very long method name to type out.
I don't understand why we'd want to stabilize the into_
methods:
make_
versions.sort()
and reverse()
on slices which work seamlessly on anything that DerefMut
s to a slice like Vec
, arrays and ArrayVec
with just one implementation. I don't think anyone is proposing implementing into_sorted()
and into_reversed()
for those types as that would be redundant but that's exactly what we seem to be doing here.If they are stabilised then should AsciiExt
be implemented for arrays, ArrayVec
, ArrayString
etc.? They shouldn't need to because they all already implement DerefMut
.
If this is just an interim strategy then does that mean they will be depreciated once we settle on a name for the good methods? If so we could just stabilise the current ones which will cover more use cases and it will be easier to migrate as it will be just a simple name change.
The into_*
methods follow standard conversion idioms in the standard library, so from our point of view they're essentially "free of charge". We figured that if AsciiExt
exists it may as well provide a nice suite of methods. Stabilizing into_*
doesn't preclude stabilizing the mutable variants.
Everything the into_
methods can do can be done with the mutable variants so while is doesn't preclude stabilising them, that would make the into_
variants redundant.
So my questions are:
AsciiExt
now be implemented for arrays, Box<[u8]>
, Box<str>
, ArrayVec
, ArrayString
etc...?into_
variants be depreciated if/when we stabilise the mutable variants?Sure, AsciiExt
should probably be implemented for a more wide array of types. I don't think we'd deprecate into
if mutable variants were stabilized, they're useful on their own right sometimes.
My point is that all these array types can already use all of the AsciiExt
methods except into_
because they implement Deref
and DerefMut
. Having to implement AsciiExt
for all these types just to get access to two of the methods which offer no more functionality than the make_
methods seems silly to me. I don't understand why we don't want to take full advantage of Deref
in this case.
If you think the into_
methods are useful sometimes, do you think we should implement methods like into_reversed()
and into_sorted()
for all of these array types as well?
No, I don't think we should add into_foo
for all other sorts of methods. Methods like reversal and sorting aren't conversions, they're operations. These are all just conversions one way or another, and the operation we just don't have a great name for.
Methods like reversal and sorting aren't conversions, they're operations. These are all just conversions one way or another, and the operation we just don't have a great name for.
This separation looks pretty artificial. As if it were specially invented to justify the into_ascii_*
methods.
@petrochenkov it's how the AsciiExt trait has turned out, it's intended for conversions.
Sure, AsciiExt should probably be implemented for a more wide array of types.
You can't. Or at least it wouldn't make much sense for them to. The into_
methods return Self::Owned
rather than Self
like the to_
methods. So when implementing AsciiExt
for any of the existing array types that already implement Deref
you can either set Owned
to Self
which will break any existing calls to the to_
methods because the return types will have changed or you can set Owned
to String
or Vec
but that would defy the point of the into_
methods.
Do you know why the into_
methods return Self::Owned
rather than Self
?
Do you know why the into_ methods return Self::Owned rather than Self?
The trait is implemented for str
and [u8]
. You can’t return them unboxed.
The trait is but the into_
methods aren't because of where Self: Sized
so it should be okay to return Self
right?
IIRC the compiler would complain about Sized
and not let me do that. An alternative would be to have a separate OwnedAsciiExt
trait (which we did at some point, but someone (Alex?) disliked having two traits).
This API is going back into final comment period.
We were previously unable to reach consensus on a good convention for make_ascii_*case
, and tried to make some progress by going forward with into_*
variants in the meantime. However:
into_
variants are ultimately redundant once we stabilize the make_
variants.We've always been somewhat ambivalent about ASCII support in std, and have scaled it back over time, but at this point, we're reasonably committed to some core functionality, and we'd like to settle these remaining APIs.
As such, we're going to:
into_
variants as well as the introduction of additional impls to support it.make_
variants.We had a lengthy discussion in the libs team last time about conventions. The key problem is that mutating methods are usually reasonably clear verb forms (like push
), and it feels unfortunate to introduce a prefix like make_
. We discussion variations around _in_place
but found this to be extremely verbose. And as several people have pointed out on thread, using conversion prefixes here doesn't feel great either -- it's a mutation, not a conversion.
Perhaps we can look at some other languages for inspiration on this convention. But let's try to get it settled this cycle.
Revert the change with into_ variants as well as the introduction of additional impls to support it.
Is #32076 what you have in mind here?
@SimonSapin I think you already saw, but for the record, https://github.com/rust-lang/rust/pull/32314 is what we had in mind.
Revisiting the naming concerns, I continue to feel like the sticking point is needing a verb, which we can get either through make
as today, or by treating e.g. uppercase
itself as a verb (as in uppercase_ascii
or ascii_uppercase
). I continue to prefer the latter, and agree with @ollie27 that uppercase_ascii
is the smoothest name.
I agree that all nouns can be verbed, but since uppercase
-as-a-verb is spelled the same as uppercase
-as-a-noun I think that uppercase_ascii
doesn’t give enough information about how it differs from to_uppercase_ascii
.
I agree with Simon's concerns about ambiguity. How about uppercasify_ascii
? :D
But in all seriousness, make_uppercase_ascii
seems like the least-bad approach I've seen.
This is a tracking issue for the unstable
ascii
feature in the standard library. These functions have the somewhat odd naming scheme ofmake_*
(not found elsewhere in the standard library). The utility with&mut str
is also somewhat questionable as there's not a lot of support for that in the standard library.Overall this probably just largely needs a decision.