unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.37k stars 176 forks source link

icu_capi crate has a dependency on icu_testdata crate #849

Closed dminor closed 2 years ago

dminor commented 3 years ago

This was added as part of https://github.com/unicode-org/icu4x/pull/835/. Although made conditional on wasm32 in the capi Cargo.toml [1], it shows up in the Cargo.lock file [2], and is pulled in while vendoring crates into Gecko. This causes the vendoring process to fail, the testdata hits an internal file size limit.

This might be a vendoring bug, but I think in general we should avoid having production crates depend upon testdata.

[1] https://github.com/unicode-org/icu4x/blob/c5bebc5d17ed4e77173cdae45e663b2e52d41c41/ffi/capi/Cargo.toml#L45 [2] https://github.com/unicode-org/icu4x/blob/c5bebc5d17ed4e77173cdae45e663b2e52d41c41/Cargo.lock#L862

sffc commented 3 years ago

CC @shadaj @Manishearth

Manishearth commented 3 years ago

This is necessary for now because WASM doesn't support other data providers; without the static data provider (which needs the testdata process) there would not be any way to do anything interesting with the WASM code.

I would add an exception for testdata for now; it's not actually being used, it's just that lockfiles are feature maximalist.

dminor commented 3 years ago

Because of this dependency, running cargo vendor will vendor in icu_testdata, which has some large files, so it does have a real effect on consumers of the icu_capi crate.

Is there a plan to remove this dependency in the future?

sffc commented 3 years ago

Ideally, WASM clients can bring their own provider, just like Rust and C++ clients, and have it built into the same *.wasm file. It shouldn't need to be a dependency of the core library long term.

dminor commented 3 years ago

I'm guessing Shadaj will be too busy with his studies to fix this for 0.4. I'd still like to see it part of 0.4, it is a soft blocker for using ICU4X in Firefox. I'm resetting the assignee so we can re-triage this.

@sffc This is marked as small, is this a potential good first bug or help wanted if we can add some additional notes to get someone started on fixing it?

sffc commented 3 years ago

This might be fixed; icu_testdata is moved to an optional feature that is enabled by the "provider_static" feature on icu_capi.

I don't know how cargo vendor works, so if it is still not fixed, I think the main deliverable here involves figuring out how to make the WASM target support "bring your own provider" and having it all built into a single WASM file.

dminor commented 3 years ago

I double checked, and unfortunately this is still being vendored in. This is only a soft blocker for use in Firefox, we can add an exception for the large data files, but it would be nice to not have icu_capi depend on icu_testdata.

Manishearth commented 3 years ago

Can you denylist the large files on vendoring? Ultimately those files are still necessary and are exposed as part of the capi via the static data provider. We could force smaller_static and have that use a different crate but that won't work for all cases.

Manishearth commented 3 years ago

A potential alternative would be to generate a smaller postcard blob for plurals+fd as well as the smaller_static build and have it live in icu_capi/testdata

dminor commented 3 years ago

I'm not sure how testdata is being used here. From Shane's comment, it sounds like eventually we want people to use write their own provider for wasm, and we're relying upon icu_testdata for now as an example of that or for testing things.

If that's the case, then it sounds like a smaller testdata blob for the capi might be a good solution, and maybe also provides an example for people to use when thinking about how they want to package their data.

Manishearth commented 3 years ago

I'm not sure how testdata is being used here

We're using it for creating a data provider for WASM. We also need it for the WearOS smaller_static data provider.

Happy to accept a change that uses smaller testdata blobs. We can move StaticDataProvider to a separate crate or a module in the ffi crate.

sffc commented 3 years ago

It's a pity that optional dependencies don't work nicely with your vendoring process.

Eventually, we want users of icu_capi to build their own data package, rather than using the default one (either the full static_provider or the stripped smaller_static). I think this can be done by a client making a new crate that re-exports everything from icu_capi plus one additional symbol for the data provider.

In the case of ICU4X, we could do this by making icu_capi that has no data and icu_capi_testdata that has a testdata provider built in. ?

Manishearth commented 3 years ago

@sffc yes, except Diplomat isn't quite ready for multi crate stuff yet.

Manishearth commented 3 years ago

I think creating a very narrow set of static testdata -- "plurals and decimals for three languages" should be fine for now

sffc commented 3 years ago

Actually. Once BlobProvider supports non-static (#848), then FFI clients can build their data and store it as static on their end, and pass it to the BlobProvider constructor. That's much simpler, and we don't have to bundle any static data in icu_capi.

In the mean time, I would support a short-term fix where we move the smaller_static data file from icu_testdata into icu_capi and then remove the icu_testdata dependency from icu_capi.

Manishearth commented 3 years ago

In the mean time, I would support a short-term fix where we move the smaller_static data file from icu_testdata into icu_capi and then remove the icu_testdata dependency from icu_capi.

We'd need to generate a second static blob that also contains plural data fwiw, smaller static is FDF only.

sffc commented 3 years ago

Fine with me if we need to have multiple choices for the smaller blobs at this point in time.

sffc commented 3 years ago

Probably I would have FFI functions that return an ICU4XStaticDataProvider for each blob, rather than using a feature to toggle between them.

Manishearth commented 3 years ago

Yeah that works!

gregtatum commented 2 years ago

I haven't had the time to work on this, I'm removing myself from the assignee.

robertbastian commented 2 years ago

icu_testdata is now optional, is that enough to fix this bug?

Manishearth commented 2 years ago

I think so!

sffc commented 2 years ago

According to @dminor's comment above, making the dependency optional (which, according to https://github.com/unicode-org/icu4x/issues/849#issuecomment-909433354, it already was) is not sufficient to solve the issue of vendored dependencies.

robertbastian commented 2 years ago

It was already optional but now it's non-default. Probably doesn't change much wrt cargo vendor

sffc commented 2 years ago
robertbastian commented 1 year ago

In 1.3 the dependency is fully removed, as we now have compiled data in icu_capi