unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.34k stars 174 forks source link

Revisit policy around component crate interdependent versions #4343

Open sffc opened 10 months ago

sffc commented 10 months ago

ICU4X 1.3 started using ~ versions between the metacrate, all component crates, and the data crates, as discussed in https://github.com/unicode-org/icu4x/issues/3537. This solves several problems:

  1. Helps prevent CLDR version skew: you generally don't want some components on a different CLDR version than other components.
  2. Keeps databake building. We don't guarantee provider struct stability across minor versions so the ~ deps make sure that the databake is built for the corresponding library version.
  3. Allows more flexibility when it comes to internal-facing APIs that cross component crate boundaries. For example, we can change the signature of an internal API in icu_locid and update call sites in component crates without having to worry about breaking older versions of those crates.

@Manishearth said he talked with the Cargo team which may have swayed his opinion in a different direction. Can you post your updated position below?

Personally, I'd like to see some user feedback and evidence before making a change away from what we agreed. From an ICU4X maintainer and integrator point of view, I'm happy with where we landed. Version skew was a real problem in ICU4X 1.0 through 1.2 and this largely solves it. Self-restraint (more restrictions on what we allow ourselves to do as ICU4X developers) can help solve problems 2 and 3 but it doesn't solve problem 1. If we can demonstrate that the version pinning causes harm to our users and if we come up with some other solution to the CLDR version skew question, then I can be swayed.

CC @robertbastian

Also see: https://github.com/unicode-org/icu4x/issues/2715

Manishearth commented 9 months ago

The basic issue was that the Cargo resolver does not do well with ~ dependencies, and that is extremely unlikely to change. What ends up happening is that Cargo can't always see that a stale ~1.3 transitive dep and a 1.4 dep can be unified by upgrading the actual crates being depended on. In other words, Cargo will accept valid lockfiles for mixes of deps like this, but will not be good at producing them, instead electing to say it can't resolve dependencies.

Now, a thing that may save us here is that these ~ deps are purely internal. I cannot currently trigger any problem cases, most of the problem cases the Cargo team was talking about was when a client of a crate uses a ~ dependency. I would like us to be careful to not suggest this to external clients, it's trivially easy to cause unresolvable problems when deptrees mix.


As a client from the google3 side, having to do upgrades lockstep is still a pain, especially with the common deps like icu_provider (If icu or icu_datagen depends on new icu4x components, they need to be imported, but they can't be imported until icu_provider is upgraded, but that can't be upgraded without upgrading icu and icu_datagen)

My general ideal world is the one where we have better self restraint to prevent problems with 2 and 3. I agree that 1 is harder to fix, but I actually think it's fine to keep using ~ deps as long as we don't have real complaints (but we should keep a look out for them).

Basically, I care less about the actual way we specify deps (until there are concrete issues we have experienced). But I would like us to try and be much more careful about internal breakages. Internal breakages are easier to police than external ones since we have a finite set of "client" code to think about. And I'm not sure we need a hard policy on this, but I think it would be nice to have it be a consideration.

Also, I'm primarily thinking about icu_provider and other common deps here. Potentially having a set of crates where we informally, internally, maintain a stability policy that can be mostly relied on when you're forced to mix versions temporarily.

sffc commented 9 months ago

I think it's a good idea to adopt a policy where by default we don't break internal cross-crate APIs but we plan to make exceptions on a case-by-case basis, similar to how we default to MSRV N-6 but allow N-4 with a strong enough motivation.

sffc commented 9 months ago

Consensus:

LGTM: @Manishearth, @sffc

Tentative consensus:

LGTM: @Manishearth, @sffc

sffc commented 9 months ago

@robertbastian Please review the above conclusions.

robertbastian commented 9 months ago

Compiled data requires that the fallback compiled data in locid transform is at the same version, otherwise deduplication breaks

Manishearth commented 9 months ago

That crate only gets the soft guarantee (and no semver relaxation): for me the line there is "still compiles and mostly works" (as in, it's an okay intermediate, temporary state to have). I think it would be good to explicitly call that out. This is one of the "version skew issues" that prompted me to propose having two separate lists, one a superset of the other.

provider/locid get the harder guarantee that we actually back by relaxing semver

robertbastian commented 9 months ago

"still compiles and mostly works" is horrible. We introduced the ~ deps because clients were having issues with mixed versions. If we allow non-~ locid_transform, there will be very silent behaviour changes that will be hard to debug.

I agree that locid and icu_provider can float more freely.

Manishearth commented 9 months ago

That's not the proposal.

The proposal is, two parts:

I split it into two to better facilitate asynchronous decisionmaking.

robertbastian commented 9 months ago

maintain a soft guarantee that we'll try and mitigate internal breakagaes

I'm unclear what this refers to

Manishearth commented 9 months ago

With the soft guarantee, whenever we change internal doc-hidden/unstable APIs (say, in, icu_locid_transform) we ensure that downstream crates on older versions still compile with the new API, perhaps by keeping compat shims around. This is best-effort, so it's fine to violate this with a discussion. This guarantee is not exposed via our semver bounds, it is a soft internal guarantee that makes the process of vendoring ICU4X easier.

sffc commented 6 months ago

Updated recommendation:

LGTM: @sffc @Manishearth

sffc commented 1 month ago

@robertbastian Do you approve the latest conclusion above?

Action on this ticket should be to document this somewhere and then close the issue.

robertbastian commented 1 week ago

lgtm