unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.37k stars 176 forks source link

Fix consistency of from_bytes, try_from_bytes, from_str, try_from_str #4931

Open sffc opened 5 months ago

sffc commented 5 months ago

In our docs there are 95 and 63 hits for from_bytes and try_from_bytes (the former includes counts from the latter), and there's also a mix of from_str and try_from_str.

We should try to fix this in 2.0. I suggest:

Alternatively, we could rename everything to try_from_utf8 instead of try_from_bytes.

EDIT: I think my preference has changed to also include try_from_str constructors. See discussion.

Feedback needed from:

robertbastian commented 5 months ago

I think FromStr::from_str should dictate naming. It is not an ideal name, but it's the name Rust uses, and we should follow convention.

I like the TinyAsciiStr model: it has a FromStr::from_str, a pub const fn from_str(s: &str) -> Result<Self>, which shadows the trait method, and an analogous from_bytes. Having an inherent from_str shadowing the trait is good for two reasons: it can be const, which the trait method cannot be, and it can be called without importing FromStr (which is not in the prelude). I think this method should not be called try_from_str, because it is FromStr::from_str. Giving it a second name is confusing, and it removes the shadow, so callers would have to either use the different name, or import core::str::FromStr.

So my suggestion is:

These methods would have to be renamed to follow that scheme:

These methods should be FromStr::from_str anyway, will send a PR:

Manishearth commented 5 months ago

I roughly agree with Robert's stance with the caveat that I'm not a huge fan of having inherent methods named the same as trait methods that are implemented by the type (so the "name something the same to get it const" is not that great, for me)

sffc commented 5 months ago

I strongly prefer using try_ for fallible non-trait methods.

It is not an ideal name, but it's the name Rust uses, and we should follow convention.

I consider FromStr to be a bit legacy not only because of the lack of try_ but also because its error type is called Err instead of Error as used in newer traits like TryFrom, whose convention I prefer to follow.

Also, people don't usually call FromStr::from_str directly because (1) it's not in the prelude and (2) people use .parse(). I see it as a means to an end to be more consistent with the Rust ecosystem, but we should define our own constructors on top of impl FromStr.

Anecdotally, I have found it misleading when the only constructor for various ICU4X types is impl FromStr. I would rather look at the docs and see a try_from_... and then I instantly know that's how to construct the thing.

const fn from_str is interesting; I hadn't thought of that case in the OP. I think I prefer try_from_str because the function shows up in our docs as a constructor.

sffc commented 5 months ago

with the caveat that I'm not a huge fan of having inherent methods named the same as trait methods that are implemented by the type

@Manishearth If you hold this position, should we reconsider https://github.com/unicode-org/icu4x/issues/4590?

sffc commented 5 months ago

(const) fn from_str(&str) -> Self

This is problematic because the signature is different than the trait function FromStr::from_str. I think if a function shadows, it should have the same signature.

How about this revised proposal:

robertbastian commented 5 months ago

(const) fn from_str(&str) -> Self

This is problematic because the signature is different than the trait function FromStr::from_str. I think if a function shadows, it should have the same signature.

I'm fine not implementing FromStr in the infallible case.

robertbastian commented 5 months ago

Actually I'm wholly against FromStr for infallible. from_str should be an inherent method, and it should not implement the trait method because the signature is different than the inherent function, we don't have infallible unwrapping (so this will be annoying to use), and there is usually no "parsing", just rewrapping (cf UnvalidatedStr), so this being available through str::parse is weird.

Also, people don't usually call FromStr::from_str directly because (1) it's not in the prelude and (2) people use .parse(). I see it as a means to an end to be more consistent with the Rust ecosystem, but we should define our own constructors on top of impl FromStr.

We actually use from_str in our docs a lot. From these three declarations, I vastly prefer the first:

zbraniecki commented 5 months ago

I think try_from_utf8 is slightly better than try_from_bytes as it makes a normalized API space for utf8/utf16 (which is what we'll want in some hot paths like Locale parsing) and aligns with https://doc.rust-lang.org/std/str/fn.from_utf8.html .

Suggestion:

We don't need to implement all of them for all constructors - just normalize the naming scheme so that we can introduce the ones we find useful incrementally.

robertbastian commented 5 months ago

+1 on using _utf8 instead of _bytes.

Manishearth commented 5 months ago

@Manishearth If you hold this position, should we reconsider #4590?

It's a very weakly held position, I think #4590 is okay. I'm also fine with the proposal here for similar reasons.

zbraniecki commented 5 months ago

I'm working on locid now in context of icu_preferences, and once I'm done I'd like to unify and improve docs on handling of to/from u8/u16/str for all subtags for preparation for locid &[u16] handling.

Any thoughts on my proposal several comments above?

sffc commented 5 months ago

Concrete proposal:

All types that are fallibly created from a string have the following functions:

All types that are infallibly created from a string have the following functions:

LGTM: @hsivonen @sffc @Manishearth @robertbastian

sffc commented 5 months ago

I didn't realize I signed off on the "(only ever called through parse in documentation, but only if it can be done without turbofishes, otherwise Foo::try_from_str)" -- I agree with the "only ever called through parse in documentation" but not necessarily the "only if it can be done without turbofishes"

zbraniecki commented 5 months ago

LGTM with the same alternation as @sffc pointed ou in the last comment. I'm not convinced that turbofish specifier is bad enough to have a blank rule against it in docs.

robertbastian commented 4 months ago

What about this pattern:

pub struct Era(pub TinyStr16);

impl From<TinyStr16> for Era {
    fn from(x: TinyStr16) -> Self {
        Self(x)
    }
}

impl FromStr for Era {
    type Err = <TinyStr16 as FromStr>::Err;
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        s.parse().map(Self)
    }
}

Both the impl From<TinyStr> as well as the impl FromStr are kind of pointless, as TinyStr already has try_from_utf8/try_from_utf16/try_from_str/from_str, and the field is public.

robertbastian commented 1 month ago

This is done for all stable components and utils, except for the datetime and plurals reference and runtime modules.

sffc commented 1 month ago

@zbraniecki WDYT about doing the facelift to those modules?

sffc commented 1 month ago

Also in 2.0 (can be stretch), apply this style to all stringy APIs such as canonicalize, normalize, ...

zbraniecki commented 1 month ago

I think we should align them with the decision here.

sffc commented 1 month ago

Conclusion:

LGTM: @sffc (@robertbastian) @Manishearth