Open sffc opened 2 months ago
I have previously been unhappy that the properties constructors only let you get static borrowed types, and would prefer that we have cheap ways to get owned types compatible with ones loaded from data structs. It's kinda annoying to have this split in types.
But it also seems to work well. It's unclear where we should be drawing the line.
For the properties types, my Default
solution won't work, because there is not a single constructor for those types. So we should probably come up with a general solution that doesn't depend on Default
.
Here's another option
FooBorrowed::new() -> FooBorrowed<'static>
FooBorrowed::static_to_owned(self) -> Foo
Foo::try_new_unstable(...) -> Result<Foo>
Foo::as_borrowed(&self) -> FooBorrowed
Here's another question. In 2.0, should the types be named Foo
and FooBorrowed
, or should they be named FooOwned
(or FooData
) and Foo
?
Hmm, if the borrowed types are the main ones it does make sense to have them be the primary ones.
But the borrowed types are the main ones only for singleton types. Are we okay with there being Foo<'static>/FooOwned
(singleton) and Foo
(and potentially FooBorrowed<'a>
) for non-singletons? I think that's fine, we would rarely be doing FooBorrowed
anyway for non-singletons.
Collator is the main example of a type that would be locale-dependent but might benefit from a Borrowed version.
We can make databake capable of returning the right thing even with a fallible try_new
on the borrowed type. It's more work, though, and I don't really want to block 2.0 on that.
It's also more work, but since it seems like we probably want to generally move in this direction, we can just consistently apply this pattern everywhere:
pub struct FooFormatterData {
payload: DataPayload<DataStructV1Marker>
}
pub struct FooFormatter<'a> {
payload: &'a DataStructV1<'a>
}
impl FooFormatterData {
pub fn get(&self) -> FooFormatter { ... }
}
impl<'a> FooFormatter<'a> {
pub fn clone_with_data(&self) -> FooFormatterData { ... }
}
impl FooFormatter<'static> {
pub fn static_to_owned(&self) -> FooFormatterData { ... }
}
Maybe for 2.0 we stick with the current naming, and in 3.0 we can consider switching things around, only because this wasn't on the table initially for 2.0 and we can't keep adding things to that milestone.
I'm pretty sure that, names aside, we can do this as an incremental migration. We can create FixedDecimalFormatterBorrowed
and with whatever functions it needs, and the owned type continues to have direct formatting functions. Then in 3.0 we can remove the format functions from the owned type and do the project-wide renaming.
In hindsight this design seems pretty obvious and I don't know why we didn't consider it sooner.
@robertbastian prefers keeping the four constructors together in the docs. @robertbastian therefore prefers:
// Always:
Foo::new() -> FooBorrowed<'static>
Foo::try_new_unstable(...) -> Result<Foo>
Foo::as_borrowed() -> FooBorrowed
@sffc points out:
@robertbastian says we could add FooBorrowed::new()
if Foo
is feature-gated.
You never want to add a function that only shows up when a feature is disabled, then global feature resolution will break code relying on the disabled crate.
One thing @robertbastian said which I agree with is that Foo::new() -> Foo
is a bad function because it unnecessarily wraps the static ref in the enum (DataPayload) which leads to less efficient code.
I think we need FooBorrowed::new()
since Foo
could be feature-gated in some cases.
As noted previously, I think we should stick with the Foo/FooBorrowed names in 2.0, and possibly revisit them in 3.0.
So I think the choices are:
// Option 1 (2024-09-03): Foo::new() and FooBorrowed::new()
Foo::new() -> FooBorrowed
Foo::try_new_unstable() -> Foo
// Always have this function available:
FooBorrowed::new() -> FooBorrowed
// Option 2 (2024-09-03): Foo::new() and optional FooBorrowed::new()
Foo::new() -> FooBorrowed
Foo::try_new_unstable() -> Foo
// Add this function only when Foo actually gets feature-gated:
FooBorrowed::new() -> FooBorrowed
// Option 3 (2024-09-03): FooBorrowed::new() only
Foo::try_new_unstable() -> Foo
FooBorrowed::new() -> FooBorrowed
Either way, I think we should always have FooBorrowed::static_to_owned()
.
I can see pros and cons of all these choices so I won't weigh in yet. I'll see what others think.
Not opposed to 1 or 3. Opposed to 2, we shouldn't have any APIs that disappear when you enable a feature gate, that gets very brittle in larger deptrees. I guess we could do some fancy doc(hidden)
stuff but most people are unlikely to browse ICU4X docs without --all-features
..
No strong opinion between 1 and 3, provided 3 has sufficient documentation pointing things to each other.
I think the intent with 2 is that we create the function in code at the same time as we create the feature that gates the owned type, and otherwise we don't have the function available.
Ah! Okay, that's acceptable to me. No strong opinion.
My preference would be to give the primary name to the borrowed version, because it's the version that baked-data code will use exclusively e.g. with the normalizer, and the version that will have more usage lines of code even in apps that don't use baked data.
I prefer the constructors for the borrowed types to be on the borrowed types themselves, because e.g. with the normalizer, putting the constructors on the owned types would create a strange reason to have to refer to the owned type in baked-data code when there would be no need to even refer to the owned type in baked-data code if the constructor was on the borrowed type.
I'm currently working on making the collator have a borrowed variant, and the collator is a bit of a special case, because it both has obviously statically-known data (the normalizer data and the root collation) as well as data that gets decided at run time (even if choosing from various baked options).
I prefer the constructors for the borrowed types to be on the borrowed types themselves, because e.g. with the normalizer, putting the constructors on the owned types would create a strange reason to have to refer to the owned type in baked-data code when there would be no need to even refer to the owned type in baked-data code if the constructor was on the borrowed type.
This is why we have discussed having the constructor on both. Having it only on the borrowed type means the rustdoc for the owned type (which I consider the main entry type right now) would only show 3/4 constructors. If it is needed on the borrowed type, e.g. because the owned type is cfg-ed, then we can have it on both.
(which I consider the main entry type right now)
A discussion I had with @sffc the other day was whether we should be changing this. For singletons I definitely feel like we should be doing FooOwned and Foo, with the borrowed version being the main entry type.
For non-singletons the discussion gets more interesting: do we think most users are going to be using compiled data? if so, the borrowed version still could be the primary entry point with the nicer name, and FooOwned becomes "nice helper type if you need Yoke behavior"
What would be cool is if there was only a single type (borrowed), the compiled constructor would return Foo<'static>
, and the owned constructors would return something like Owned<Foo<'static>>
, which would deref to Foo<'static>
.
That's kind of what Yoke can do.
It wouldn't be able to deref to Foo<'static>
though.
I don't think there's generally a way to make Owned<Foo<'static>>
because
My position is that I think we should move toward having FooData
(or FooOwned
) and Foo<'a>
everywhere, even on non-singleton types, but not in 2.0 because that is a really big change.
A middle ground would be to use this naming scheme only for types that currently have owned and borrowed variants, but I think this middle ground is bad because:
To resolve this thread, I am happy with "Option 2 (2024-09-03)"
Observation: if we have Foo::new() -> Foo
and then we migrate Foo
to have a borrowed variant, we won't be able to change the function signature in 2.x. We'd need to deprecate that function and create Foo::new_borrowed()
or something.
I think it's unfortunate if we don't end up switching from FooBorrowed
and Foo
to Foo
and FooOwned
(or FooData
) before 2.0, but also I'm not volunteering to make the change.
I don't see a clear consensus on this thread and it seems we are not on the same page so I tagged it with discuss-priority
@zbraniecki should weigh in here, too.
Some discussion:
Discussion with @sffc @zbraniecki @robertbastian
In the short term:
Foo
and FooBorrowed<'a>
Foo::new
returns FooBorrowed<'static>
FooBorrowed::new
also returns FooBorrowed<'static>
FooBorrowed<'a>
s can be added incrementally after 2.0; if so, the corresponding Foo::new
would become #[deprecated]
because returning FooBorrowed<'a>
would be breakingLGTM: (@sffc) @robertbastian (@zbraniecki)
Could consider for later:
FooOwned
and Foo<'a>
Foo<'a>
Foo
to Foo<'a>
is breakingLGTM: @sffc @robertbastian
LGTM on both. Would really like to make compiled static data be the happy path
This came up in https://github.com/unicode-org/icu4x/pull/5413#discussion_r1724947448.
On types that have an owned and a borrowed variant, should the compiled data constructors build the owned or the borrowed variant? For example, should it be:
or
or
I think
Foo::default() -> Foo
should always be available because that's the way to get an actualFoo
from compiled data, and clients might need an actualFoo
if the object could be created multiple different ways.I don't have a super strong opinion between options 1, 2, and 3.
@Manishearth @zbraniecki @hsivonen @robertbastian @nekevss