rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.54k stars 12.74k forks source link

Tracking issue for make_ascii_{upper,lower}case #27809

Closed alexcrichton closed 8 years ago

alexcrichton commented 9 years ago

This is a tracking issue for the unstable ascii feature in the standard library. These functions have the somewhat odd naming scheme of make_* (not found elsewhere in the standard library). The utility with &mut str is also somewhat questionable as there's not a lot of support for that in the standard library.

Overall this probably just largely needs a decision.

alexcrichton commented 9 years ago

cc @SimonSapin

Gankra commented 9 years ago

Minor correction: Rc::make_unique is another example of this naming. Unstable of course. On Aug 13, 2015 10:09 AM, "Alex Crichton" notifications@github.com wrote:

cc @SimonSapin https://github.com/SimonSapin

— Reply to this email directly or view it on GitHub https://github.com/rust-lang/rust/issues/27809#issuecomment-130763329.

SimonSapin commented 9 years ago

There should be a way to do this operation without allocating. I don’t have a strong opinion on in place with &mut str v.s. consuming and returning String (as in the now-deprecated OwnedAsciiExt trait), except that the former is sometimes slightly less convenient. (You need three statements instead of one expression.)

Ms2ger commented 9 years ago

I don't like the &mut approach, but that might be because I'm not yet used to it.

nox commented 9 years ago

I would prefer consuming the given String too.

murarth commented 9 years ago

The &mut str approach has the advantage that it can be used to convert an arbitrary subslice within a String. If the consuming String approach is adopted instead, the ability to do that would be lost.

SimonSapin commented 9 years ago

You can always iterate the &mut u8 bytes in a string and map them to upper case one by one (which is what the &mut str impl does). (Although yes, doing so is unsafe and it’s nice to have a safe wrapper for it.)

barosl commented 9 years ago

As for naming, something like to_ascii_uppercase_in_place might be better. It clearly states the purpose of the function, and it also has a bonus point that this function will be sorted around the more canonical to_ascii_uppercase function. The drawback is that the name is much longer, but I think it won't be much a problem as the usage of this operation might be much rarer.

barosl commented 9 years ago

But I also think make_ makes sense if we go the route of consuming String and returning String. In that case, we would lose a safe wrapper on slices, but that's for another decision.

SimonSapin commented 9 years ago

+1 for *_in_place if we keep &mut self mutating method (as opposed to self consuming methods).

SimonSapin commented 8 years ago

If the libs team has the bandwidth, I’d like to nominate this for discussion. To sum up, options are:

Both are equally general: at worst if all you have is a &mut String, you can do the mem::replace dance to use methods that consume and return an owned String.

SimonSapin commented 8 years ago

Also, it’s possible to build this out of tree:

for byte in &mut (bytes: Vec<u8>) {
    *byte = byte.to_ascii_lowercase()
}

for byte in unsafe { (s: String).as_mut_vec() } {
    *byte = byte.to_ascii_lowercase()
}

… though it would be nice to encapsulate the unsafety in the String case.

alexcrichton commented 8 years ago

:bell: This issue is now entering its cycle final comment period to be deprecated in 1.8 :bell:

SimonSapin commented 8 years ago

What’s the rationale for not stabilizing any of the two proposed designs?

alexcrichton commented 8 years ago

Ah yes, to clarify we concluded that this stuck out enough and could be easily enough built on crates.io (e.g. externally) that it wasn't necessary to stabilize in libstd at this time.

SimonSapin commented 8 years ago

could be easily enough built on crates.io

The same can be said of the entire std::ascii module (and other things in std). I don’t understand having some of the related functionality in, but leave these couple methods out (which for the str case require unsafe.)

alexcrichton commented 8 years ago

Sure, but that part's already stable. The make_ convention sticks out (at least to me) and I would prefer to not stabilize this.

SimonSapin commented 8 years ago

I’m not a big fan of make_* either.

What about .into_ascii_lowercase(self) -> Self?

alexcrichton commented 8 years ago

If those methods could be merged into the AsciiExt trait itself I'd be more comfortable with that, but previously they were a separate trait.

SimonSapin commented 8 years ago

Alright. IIRC we had a separate trait because dynamically-sized types and methods taking self by value don’t play nice with each other. But it turns out Iterator already does this: the trick is having a default implementation with a where Self: Sized bound.

See PR https://github.com/rust-lang/rust/pull/31335.

ollie27 commented 8 years ago

What do into_ascii_{upper,lower}case offer that make_ascii_{upper,lower}case don't?

Both are equally general: at worst if all you have is a &mut String, you can do the mem::replace dance to use methods that consume and return an owned String.

Is that true? If so how do you handle &mut str and &mut [u8] without two more implementations of AsciiExt for them?

If naming is still an issue with make_* how about just uppercase_ascii() and lowercase_ascii()? "uppercase" and "lowercase" are also verbs so this fits the current naming scheme.

SimonSapin commented 8 years ago

Yeah that’s true, you can’t mem::replace a &mut str like you can a &mut String, I wasn’t thinking of that case when I wrote that.

uppercase_ascii() without a prefix is not informative enough to say how it’s different from to_uppercase_ascii() IMO.

ollie27 commented 8 years ago

Well it has a different signature so maybe that's enough difference.

Anyway it looks like the choice is between:

  1. Good semantics, bad name (make_ascii_uppercase() / to_ascii_uppercsase_in_place() / uppercase_ascii())
  2. Bad semantics, good name (into_ascii_uppercase())
  3. No method at all

Personally I think good semantics are more important than a good name. Also the other String manipulation methods (push_str(), pop(), insert(), clear() etc...) all work in place so I think we should stay consistent.

petrochenkov commented 8 years ago

+ to &mut str whatever the name is, avoiding it seems like some irrational fear.

SimonSapin commented 8 years ago

Bad semantics

Is it really? In every practical use I’ve seen consuming self works nicely while working on &mut requires introducing a temporary variable and have three statements instead of one expression.

ollie27 commented 8 years ago

Well it's really bad if you have a &mut String and of course useless if you have a &mut str. Why do other String and Vec mutation methods work in place rather than consuming self? What makes these methods different?

Ms2ger commented 8 years ago

There's also the case of stack-allocated buffers to consider, especially in libraries that don't want to allocate at all.

nox commented 8 years ago

@ollie27 If you have &mut String you can always make a little dance with mem::replace and call the self-consuming function. Oh Simon mentioned that.

ollie27 commented 8 years ago

As far as I can tell the only real reason not to stabilise these methods is the name so here are some random possibilities:

etc...

I vote for uppercase_ascii() because self mutating methods seem to be the default so I don't think we need a make_ prefix or an _in_place suffix, it is short and it won't be confused with to_ascii_uppercase() because it has a very different signature.

I think however this may just be a case of picking the least bad name and going with it.

What do other people think?

alexcrichton commented 8 years ago

The libs team discussed this during triage yesterday and we reached a few conclusions:

For now we're going to move this out of FCP. Adding the into_ methods is a good interim strategy, and otherwise we can continue to debate the naming of the mutable variants.

photino commented 8 years ago

to_ascii_uppercase_in_place sounds the best choice. Then we can introduce the naming convention to_*_in_place for the fn to_*_in_place(&mut self); signature.

Usages of the to_ prefix and _in_place suffix in std:

alexcrichton commented 8 years ago

@photino yeah that was kinda what we were leaning towards as well, the downside being that it's a very long method name to type out.

ollie27 commented 8 years ago

I don't understand why we'd want to stabilize the into_ methods:

If they are stabilised then should AsciiExt be implemented for arrays, ArrayVec, ArrayString etc.? They shouldn't need to because they all already implement DerefMut.

If this is just an interim strategy then does that mean they will be depreciated once we settle on a name for the good methods? If so we could just stabilise the current ones which will cover more use cases and it will be easier to migrate as it will be just a simple name change.

alexcrichton commented 8 years ago

The into_* methods follow standard conversion idioms in the standard library, so from our point of view they're essentially "free of charge". We figured that if AsciiExt exists it may as well provide a nice suite of methods. Stabilizing into_* doesn't preclude stabilizing the mutable variants.

ollie27 commented 8 years ago

Everything the into_ methods can do can be done with the mutable variants so while is doesn't preclude stabilising them, that would make the into_ variants redundant.

So my questions are:

  1. Should AsciiExt now be implemented for arrays, Box<[u8]>, Box<str>, ArrayVec, ArrayString etc...?
  2. Will the into_ variants be depreciated if/when we stabilise the mutable variants?
alexcrichton commented 8 years ago

Sure, AsciiExt should probably be implemented for a more wide array of types. I don't think we'd deprecate into if mutable variants were stabilized, they're useful on their own right sometimes.

ollie27 commented 8 years ago

My point is that all these array types can already use all of the AsciiExt methods except into_ because they implement Deref and DerefMut. Having to implement AsciiExt for all these types just to get access to two of the methods which offer no more functionality than the make_ methods seems silly to me. I don't understand why we don't want to take full advantage of Deref in this case.

If you think the into_ methods are useful sometimes, do you think we should implement methods like into_reversed() and into_sorted() for all of these array types as well?

alexcrichton commented 8 years ago

No, I don't think we should add into_foo for all other sorts of methods. Methods like reversal and sorting aren't conversions, they're operations. These are all just conversions one way or another, and the operation we just don't have a great name for.

petrochenkov commented 8 years ago

Methods like reversal and sorting aren't conversions, they're operations. These are all just conversions one way or another, and the operation we just don't have a great name for.

This separation looks pretty artificial. As if it were specially invented to justify the into_ascii_* methods.

alexcrichton commented 8 years ago

@petrochenkov it's how the AsciiExt trait has turned out, it's intended for conversions.

ollie27 commented 8 years ago

Sure, AsciiExt should probably be implemented for a more wide array of types.

You can't. Or at least it wouldn't make much sense for them to. The into_ methods return Self::Owned rather than Self like the to_ methods. So when implementing AsciiExt for any of the existing array types that already implement Deref you can either set Owned to Self which will break any existing calls to the to_ methods because the return types will have changed or you can set Owned to String or Vec but that would defy the point of the into_ methods.

Do you know why the into_ methods return Self::Owned rather than Self?

SimonSapin commented 8 years ago

Do you know why the into_ methods return Self::Owned rather than Self?

The trait is implemented for str and [u8]. You can’t return them unboxed.

ollie27 commented 8 years ago

The trait is but the into_ methods aren't because of where Self: Sized so it should be okay to return Self right?

SimonSapin commented 8 years ago

IIRC the compiler would complain about Sized and not let me do that. An alternative would be to have a separate OwnedAsciiExt trait (which we did at some point, but someone (Alex?) disliked having two traits).

aturon commented 8 years ago

This API is going back into final comment period.

We were previously unable to reach consensus on a good convention for make_ascii_*case, and tried to make some progress by going forward with into_* variants in the meantime. However:

We've always been somewhat ambivalent about ASCII support in std, and have scaled it back over time, but at this point, we're reasonably committed to some core functionality, and we'd like to settle these remaining APIs.

As such, we're going to:

We had a lengthy discussion in the libs team last time about conventions. The key problem is that mutating methods are usually reasonably clear verb forms (like push), and it feels unfortunate to introduce a prefix like make_. We discussion variations around _in_place but found this to be extremely verbose. And as several people have pointed out on thread, using conversion prefixes here doesn't feel great either -- it's a mutation, not a conversion.

Perhaps we can look at some other languages for inspiration on this convention. But let's try to get it settled this cycle.

SimonSapin commented 8 years ago

Revert the change with into_ variants as well as the introduction of additional impls to support it.

Is #32076 what you have in mind here?

aturon commented 8 years ago

@SimonSapin I think you already saw, but for the record, https://github.com/rust-lang/rust/pull/32314 is what we had in mind.

aturon commented 8 years ago

Revisiting the naming concerns, I continue to feel like the sticking point is needing a verb, which we can get either through make as today, or by treating e.g. uppercase itself as a verb (as in uppercase_ascii or ascii_uppercase). I continue to prefer the latter, and agree with @ollie27 that uppercase_ascii is the smoothest name.

SimonSapin commented 8 years ago

I agree that all nouns can be verbed, but since uppercase-as-a-verb is spelled the same as uppercase-as-a-noun I think that uppercase_ascii doesn’t give enough information about how it differs from to_uppercase_ascii.

sfackler commented 8 years ago

I agree with Simon's concerns about ambiguity. How about uppercasify_ascii? :D

But in all seriousness, make_uppercase_ascii seems like the least-bad approach I've seen.