google / rust_icu

rust_icu: rust bindings for ICU (International Components for Unicode) library
Apache License 2.0
113 stars 34 forks source link

rust_icu: low-level rust language bindings for the ICU library

Item Description
Testing Test status
Source https://github.com/google/rust_icu
README https://github.com/google/rust_icu/blob/main/README.md
Coverage View report
Docs https://docs.rs/crate/rust_icu

This is a library of low level native rust language bindings for the International Components for Unicode (ICU) library for C (a.k.a. ICU4C).

If you just want quick instructions on how to download and install, see the quickstart guide

See the ICU project home page for details about the ICU library. The library source can be viewed on Github.

The latest version of this file is available at https://github.com/google/rust_icu.

This is not an officially supported Google product.

Why wrap ICU (vs. doing anything else)?

Structure of the repository

The repository is organized as a cargo workspace of rust crates. Each crate corresponds to the respective header in the ICU4C library's C API. Please consult the coverage report for details about function coverage in the headers.

Crate Description
rust_icu Top-level crate. Include this if you just want to have all the functionality available for use.
rust_icu_common Commonly used low-level wrappings of the bindings.
rust_icu_intl Implements ECMA 402 recommendation APIs.
rust_icu_sys Low-level bindings code
rust_icu_ubrk Support for text boundary analysis. Implements ubrk.h C API header from the ICU library.
rust_icu_ucal ICU Calendar. Implements ucal.h C API header from the ICU library.
rust_icu_ucol Collation support. Implements ucol.h C API header from the ICU library.
rust_icu_udat ICU date and time. Implements udat.h C API header from the ICU library.
rust_icu_udata ICU binary data. Implements udata.h C API header from the ICU library.
rust_icu_uenum ICU enumerations. Implements uenum.h C API header from the ICU library. Mainly UEnumeration and friends.
rust_icu_uformattable Locale-sensitive list formatting support. Implements uformattable.h C API header from the ICU library. Since 0.3.1.
rust_icu_ulistformatter Locale-sensitive list formatting support. Implements ulistformatter.h C API header from the ICU library.
rust_icu_uloc Locale support. Implements uloc.h C API header from the ICU library.
rust_icu_umsg MessageFormat support. Implements umsg.h C API header from the ICU library.
rust_icu_unorm2 Unicode normalization support. Implements unorm2.h C API header from the ICU library.
rust_icu_unum Number formatting support. Implements unum.h C API header from the ICU library.
rust_icu_unumberformatter Number formatting support (modern). Implements unumberformatter.h C API header from the ICU library.
rust_icu_upluralrules Locale-sensitive plural rules support. Implements upluralrules.h C API header from the ICU library.
rust_icu_ustring ICU strings. Implements [ustring.h]() C API header from the ICU library.
rust_icu_utext Text operations. Implements utext.h C API header from the ICU library.
rust_icu_utrans Transliteration support. Implements utrans.h C API header from the ICU library.

Limitations

The generated rust language binding methods of today limit the availability of language bindings to the available C API. The ICU library's C API (sometimes referred to as ICU4C in the documentation) is distinct from the ICU C++ API.

The bindings offered by this library have somewhat limited applicability, which means it may sometimes not work for you out of the box. If you come across such a case, feel free to file a bug for us to fix. Pull requests are welcome.

The limitations we know of today are as follows:

Compatibility

The compatibility guarantee is as follows:

  1. Automated tests are executed for last three major ICU library versions in all feature combinations of interest.
  2. Automated tests are executed for the ICU library version in use by the docs.rs system (so the documentation could be built).
rust_icu version ICU 63.x ICU 70.1 ICU 71.1 ICU 72.1 ICU 73.1 ICU 74.1
3.0
4.0
5.0

Features

The rust_icu library is intended to be compiled with cargo, with one of several features enabled. Compilation with cargo allows us to do some library detection in a custom build.rs file in the rust_icu_sys library and adapt the build process to your build environment. However, since not every development environment will use the same settings, we opted to offer certain features (below) as configuration options.

While our intention is to keep the list of features below up to date with the actual list in Cargo.toml, the list may periodically go out of date.

To use any of the features, you will need to activate the feature in all the rust_icu_* crates that you intend to use. Failing to do this will result in confusing compilation end result.

Feature Default? Description
use-bindgen Yes If set, cargo will run bindgen to generate bindings based on the installed ICU library. The program icu-config must be in $PATH for this to work. In the future there may be other approaches for auto-detecting libraries, such as via pkg-config.
renaming Yes If set, ICU bindings are generated with version numbers appended. This is called "renaming" in ICU, and is normally needed only when linking against specific ICU version is required, for example to work around having to link different ICU versions. See the ICU documentation for a discussion of renaming. This feature MUST be used when bindgen is NOT used.
icu_config Yes If set, the binary icu-config will be used to configure the library. Turn this feature off if you do not want build.rs to try to autodetect the build environment. You will want to skip this feature if your build environment configures ICU in a different way. This feature is only meaningful when bindgen feature is used; otherwise it has no effect.
icu_version_in_env No If set, ICU bindings are made for the ICU version specified in the environment variable RUST_ICU_MAJOR_VERSION_NUMBER, which is made available to cargo at build time. See section below for details on how to use this feature. This feature is only meaningful when bindgen feature is NOT used; otherwise it has no effect.
static No If set, link ICU libraries statically (and the standard C++ dynamically). You can use RUST_ICU_LINK_SEARCH_DIR to add an extra path to the search path if you have a build of ICU in a non-standard directory.

Prerequisites

Required

Optional

Testing

There are a few options to run the test for rust_icu.

Cargo

Building and testing using cargo is the canonical way of building and testing rust code.

In the case of the rust_icu library you may find that your system's default ICU development package is ancient, in which case you will need to build your own ICU4C library (see below for that). That will make it necessary to pass in PKG_CONFIG_PATH and LD_LIBRARY_PATH environment variables to help the bulid code locate and use the library you built, instead of the system default.

The following tests should all build and pass. Note that because the libraries needed are in a custom location, we need to set LD_LIBRARY_PATH when running the tests, as well as PKG_CONFIG_PATH.

If you find that you are able to use your system's default ICU installation, you can safely omit the two libraries.

env PKG_CONFIG_PATH="$HOME/local/lib/pkgconfig" \
    LD_LIBRARY_PATH="$HOME/local/lib" \
        bash -c 'cargo test'

If you think that the above approach is too much of a hassle, consider trying out the Docker-based approach.

GNU Make

If you happen to like the GNU way of doing things, you may appreciate the GNU Make approach.

The easiest way is to use GNU Make and run:

make test

You may want to use this method if you are working on rust_icu, have your development environment all set up and would like a shorthand to run the tests.

Docker-based

See optional dependencies section above.

To run a hermetic build and test of the rust_icu source code, issue the following command:

make docker-test

This will run docker-based build and test of the source code on your local machine. This is a good way to test that your code works with a specific reference version of ICU.

Prior art

There is plenty of prior art that has been considered:

The current state of things is that I'd like to do a few experiments on my own first, then see if the work can be folded into any of the above efforts.

See also:

Assumptions

There are a few competing approaches for ICU bindings. However, it seems, at least based on information available in rust's RFC repos, that the work on ICU support in rust is still ongoing.

These are the assumptions made in the making of this library:

Additional instructions

Quickstart guide

Before you begin, please ensure the following prerequisites are met:

From there, the following sequence of commands will check out, build and test the rust_icu source code.

mkdir -p ~/tmp
cd tmp
git clone https://github.com/google/rust_icu
cd rust_icu
make docker-test

You can now make changes to the code and tests. You can re-run the compile and test cycle by running make docker-test.

ICU installation instructions

These instructions follow the "out-of-tree" build instructions from the ICU repository.

Assumptions

The instructions below are not self-contained. They assume that:

Compilation

mkdir -p $HOME/local
mkdir -p $HOME/tmp
cd $HOME/tmp
git clone https://github.com/unicode-org/icu.git
mkdir icu4c-build
cd icu4c-build
../icu/icu4c/source/runConfigureICU Linux \
  --prefix=$HOME/local \
  --enable-static
make
make install
make doc

If the compilation finishes with success, the directory $HOME/local/bin will have the file icu-config which is necessary to discover the library configuration.

You can also do a

make check

to run the unit tests.

If you add $HOME/local/bin to $PATH, or move icu-config to a directory that is listed in your $PATH you should be all set to compile rust_icu.

ICU rebuilding instructions

If you change the configuration of the ICU library with an intention to rebuild the library from source you should probably add an intervening make clean command.

Since the ICU build is not hermetic, this ensures there are no remnants of the old compilation process sitting around in the build directory. You need to do this for example if you upgrade the major version of the ICU library. If you forget to do so, you may see unexpected errors while compiling ICU, or while linking or running your programs.

Compiling for a set version of ICU

Assumptions

OR:

The following is a tested example.

env PKG_CONFIG_PATH="$HOME/local/lib/pkgconfig" \
    LD_LIBRARY_PATH="$HOME/local/lib" \
    RUST_ICU_MAJOR_VERSION_NUMBER=65 \
        bash -c 'cargo test'

The following would be an as of yet untested example of compiling rust_icu against a preexisting ICU version 66.

env PKG_CONFIG_PATH="$HOME/local/lib/pkgconfig" \
    LD_LIBRARY_PATH="$HOME/local/lib" \
    RUST_ICU_MAJOR_VERSION_NUMBER=66 \
        bash -c 'cargo test'

Adding support for a new version of ICU.

In general, as long as icu-config approach is supported, it should be possible to generate the library wrappers for newer versions of the ICU library, assuming that the underlying C APIs do not diverge too much.

An approach that yielded easy support for ICU 65.1 consisted of the following steps. Below, $RUST_ICU_SOURCE_DIR is the directory where you extracted the ICU source code.

These files lib_XX.rs may need to be generated again if build.rs is changed to include more features.

Adding more bindings

When adding more ICU wrappers, make sure to do the following:

Testing with a specific feature set turned on

Here's an example of running a docker test on ICU 67, with features icu_version_in_env and renaming turned on instead of the default. Note that the parameters are mostly passed into the container that runs docker-test via environment variables.

make DOCKER_TEST_ENV=rust_icu_testenv-67 \
  RUST_ICU_MAJOR_VERSION_NUMBER=67 \
  DOCKER_TEST_CARGO_TEST_ARGS='--no-default-features --features icu_version_in_env,renaming' \
  docker-test

Some clarification:

Refreshing static bindgen files

Requires docker.

Run make static-bindgen periodically, to refresh the statically generated bindgen files (named lib_XX.rs, where XX is an ICU version, e.g. 67) in the directory rust_icu_sys/bindgen which are used when bindgen features are turned off.

Invoking this make target will modify the local checkout with the newer versions of the files lib_XX.rs. Make a pull request and check them in.

For more information on why this is needed, see the bindgen README.md.