unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.34k stars 174 forks source link

Reduce Dart binary size #4690

Open robertbastian opened 6 months ago

robertbastian commented 6 months ago

The ICU4X shared library with full compiled data and all features currently measures around 30MB on Linux. For Dart we need to greatly reduce this in order to be usable.

How does ICU4X deal with this in other languages

ICU4X's API is designed around many small functions, so that the compiler can aggresively optimise. In Rust, the compiler has a view of the whole program, so it is in a position to throw out code (and data). This also holds for C/C++ when doing static linking, where the C compiler compiles C code against our static library with a whole-program view.

Why does this not work in Dart

Dart only supports dynamic linking (https://github.com/dart-lang/sdk/issues/49418), i.e. the library is loaded into memory at runtime, and the system helps the Dart binary find the required functions. This means, however, that the shared library is compiled independently of the Dart binary, and cannot be compile-time optimised.

Approaches for reducing the binary size in Dart

Static linking

The simplest, and probably most performant, solution would be for Dart to support static linking. However, there's currently no concrete plan for this on the Dart side (https://github.com/dart-lang/sdk/issues/49418).

Tree shaking

Conceptually, Dart already uses something like static linking. We do not dynamically link against a system (or shared) library, instead we need to ship our own library inside Dart's asset system (https://github.com/dart-lang/sdk/issues/54003). The Dart compiler is aware of all ICU4X functions that are reachable in the compiled binary (through the @ResourceIdentifier which we added in https://github.com/rust-diplomat/diplomat/pull/442), so it could remove unreachable symbols from the dynamic library. There is currently a work-in-progress custom link.dart script, which gets invoked during the compilation and has access to the list of @ResourceIdentifier. We can use this to tree-shake our shared library to a minimal shared library.

Filtering a shared library

Shared libraries are platform specific executable files, which very little metdata beyond a symbols table in the shape that a dynamic linker can understand, and code (i.e. for Linux these will be ELF files). We were not able to find any tools that can filter a dynamic library in the way we require.

Creating a minimal shared library from a static library

We do have access to a native C toolchain for each Dart target through the native_assets_cli package. This means we can use a two-step compilation process as follows:

// symbols.lds
{
  global:
    ICU4XFixedDecimal_create_from_i32;
  local:
    *;
};
$ clang -fPIC -shared -u ICU4XFixedDecimal_create_from_i32 -Wl,--version-script=symbols.lds \
 -Wl,--gc-sections -Wl,-strip-debug -o out.so <static-lib>

In experiments this reduces the binary size to e.g. ~1.7MB for collation (including data).

Open questions

So far we have tested this approach on Linux. We will need to confirm that this is feasible for all Dart platforms.

Data size

While the shared library tree shaking is able to reduce code size by removing unused functionality, it is not able to remove unused locales. ICU4X by default builds with around 200 locales in "compiled data" mode, which make up a large chunk of the binary size.

Custom compiled data

The most performant approach to custom data in ICU4X is custom compiled data. This uses icu_datagen to generate Rust code, which is then used during the build of the ICU4X binary. However, as we lack the ability to build the ICU4X library during the Dart build, we cannot use this approach in the general case. We could generate binaries with different sets of locales, but this would lead to a combinatorial explosion of dart platform x locale sets, and its unclear which locale sets we should support.

Serialized data

The more flexible approach to custom data is to load serialised data blobs at runtime. Our deserialisation is zero-copy (no allocation, only validation), so there's no significant performance impact. It does however let us generate data and binaries separately.

In this approach we will generate the static library with only a small subset of universally required compiled data (such as fallback data), and everything else will be provided by serialised data. We can generate the required blob of serialised data in the link.dart phase, as we have a list of used functions, which we can map to required data (https://github.com/unicode-org/icu4x/issues/2685). This will be done by a precompiled Rust binary (https://github.com/unicode-org/icu4x/pull/4347), which we ship for each host platform. The binary will include the complete precomputed data, in order to not have to generate data from first principles (CLDR), but only to filter out unselected locales.

We then use Dart's assets-functionality to package the serialised blob into the Dart binary, and access it at runtime.

Open questions

In order to generate custom locale data, we need some way for the client to select the desired list of locales, which we can consume in link.dart.

robertbastian commented 6 months ago

Next steps

mosuem commented 6 months ago

This will be done by a precompiled Rust binary (https://github.com/unicode-org/icu4x/pull/4347), which we ship for each host platform.

This will also have to be distributed via the CDN together with the ICU4X binaries, see #4689. So there, we will probably want to distributed (zipped together?) a compiled icu4x, compiled icu_datagen, and a full data blob.

In order to generate custom locale data, we need some way for the client to select the desired list of locales, which we can consume in link.dart.

Ideally, this would be provided using the same @ResourceIdentifier mechanism, as part of the API of package:intl4x. I opened an issue here.

robertbastian commented 6 months ago

2685

robertbastian commented 6 months ago

Discussion with @Manishearth @sffc @mosuem @robertbastian

There will be two compilation modes: with and without Rust