Closed dminor closed 2 years ago
CC @shadaj @Manishearth
This is necessary for now because WASM doesn't support other data providers; without the static data provider (which needs the testdata process) there would not be any way to do anything interesting with the WASM code.
I would add an exception for testdata for now; it's not actually being used, it's just that lockfiles are feature maximalist.
Because of this dependency, running cargo vendor
will vendor in icu_testdata, which has some large files, so it does have a real effect on consumers of the icu_capi
crate.
Is there a plan to remove this dependency in the future?
Ideally, WASM clients can bring their own provider, just like Rust and C++ clients, and have it built into the same *.wasm file. It shouldn't need to be a dependency of the core library long term.
I'm guessing Shadaj will be too busy with his studies to fix this for 0.4. I'd still like to see it part of 0.4, it is a soft blocker for using ICU4X in Firefox. I'm resetting the assignee so we can re-triage this.
@sffc This is marked as small
, is this a potential good first bug
or help wanted
if we can add some additional notes to get someone started on fixing it?
This might be fixed; icu_testdata is moved to an optional feature that is enabled by the "provider_static" feature on icu_capi.
I don't know how cargo vendor
works, so if it is still not fixed, I think the main deliverable here involves figuring out how to make the WASM target support "bring your own provider" and having it all built into a single WASM file.
I double checked, and unfortunately this is still being vendored in. This is only a soft blocker for use in Firefox, we can add an exception for the large data files, but it would be nice to not have icu_capi
depend on icu_testdata
.
Can you denylist the large files on vendoring? Ultimately those files are still necessary and are exposed as part of the capi via the static data provider. We could force smaller_static and have that use a different crate but that won't work for all cases.
A potential alternative would be to generate a smaller postcard blob for plurals+fd as well as the smaller_static build and have it live in icu_capi/testdata
I'm not sure how testdata is being used here. From Shane's comment, it sounds like eventually we want people to use write their own provider for wasm, and we're relying upon icu_testdata for now as an example of that or for testing things.
If that's the case, then it sounds like a smaller testdata blob for the capi might be a good solution, and maybe also provides an example for people to use when thinking about how they want to package their data.
I'm not sure how testdata is being used here
We're using it for creating a data provider for WASM. We also need it for the WearOS smaller_static data provider.
Happy to accept a change that uses smaller testdata blobs. We can move StaticDataProvider to a separate crate or a module in the ffi crate.
It's a pity that optional dependencies don't work nicely with your vendoring process.
Eventually, we want users of icu_capi
to build their own data package, rather than using the default one (either the full static_provider
or the stripped smaller_static
). I think this can be done by a client making a new crate that re-exports everything from icu_capi
plus one additional symbol for the data provider.
In the case of ICU4X, we could do this by making icu_capi
that has no data and icu_capi_testdata
that has a testdata provider built in. ?
@sffc yes, except Diplomat isn't quite ready for multi crate stuff yet.
I think creating a very narrow set of static testdata -- "plurals and decimals for three languages" should be fine for now
Actually. Once BlobProvider supports non-static (#848), then FFI clients can build their data and store it as static on their end, and pass it to the BlobProvider constructor. That's much simpler, and we don't have to bundle any static data in icu_capi
.
In the mean time, I would support a short-term fix where we move the smaller_static data file from icu_testdata into icu_capi and then remove the icu_testdata dependency from icu_capi.
In the mean time, I would support a short-term fix where we move the smaller_static data file from icu_testdata into icu_capi and then remove the icu_testdata dependency from icu_capi.
We'd need to generate a second static blob that also contains plural data fwiw, smaller static is FDF only.
Fine with me if we need to have multiple choices for the smaller blobs at this point in time.
Probably I would have FFI functions that return an ICU4XStaticDataProvider for each blob, rather than using a feature to toggle between them.
Yeah that works!
I haven't had the time to work on this, I'm removing myself from the assignee.
icu_testdata
is now optional, is that enough to fix this bug?
I think so!
According to @dminor's comment above, making the dependency optional (which, according to https://github.com/unicode-org/icu4x/issues/849#issuecomment-909433354, it already was) is not sufficient to solve the issue of vendored dependencies.
It was already optional but now it's non-default. Probably doesn't change much wrt cargo vendor
cargo vendor
is a pain with large files. Second, there's a philosophical problem about a production crate relying on testdata.cargo vendor
has a whole lot of problems, and I think we need to expect that additional tooling works around its deficiencies.In 1.3 the dependency is fully removed, as we now have compiled data in icu_capi
This was added as part of https://github.com/unicode-org/icu4x/pull/835/. Although made conditional on wasm32 in the capi Cargo.toml [1], it shows up in the Cargo.lock file [2], and is pulled in while vendoring crates into Gecko. This causes the vendoring process to fail, the testdata hits an internal file size limit.
This might be a vendoring bug, but I think in general we should avoid having production crates depend upon testdata.
[1] https://github.com/unicode-org/icu4x/blob/c5bebc5d17ed4e77173cdae45e663b2e52d41c41/ffi/capi/Cargo.toml#L45 [2] https://github.com/unicode-org/icu4x/blob/c5bebc5d17ed4e77173cdae45e663b2e52d41c41/Cargo.lock#L862