tapeinosyne / hyphenation

Text hyphenation for Rust
Apache License 2.0
53 stars 12 forks source link

Embedded all language in lib make it too fat #13

Open blackgear opened 6 years ago

blackgear commented 6 years ago

Although we can edit build.rs manually, change the

let langs = vec![
    "af",
    "hy",
...
    "hsb",
    "cy"
];

to

let langs = vec!["en"];

to make the final lib much smaller (8.65 MB -> 132 KB). I think its nice to have a option in [dependencies.hyphenation] to set what language to be embedded.

Maybe something like this:

[dependencies.hyphenation]
version = "0.6.0"
features = ["nfd"]
language = ["en-us"]
tapeinosyne commented 6 years ago

Dictionary embedding was already going to be under a feature flag starting with the next release, and adding individual language flags is certainly an idea worth considering. (It would have to be flags, because the Cargo manifest format and Rust cfg system are not flexible enough to allow as nice a syntax as language = ["en_us"] for library features.) It will probably happen soon, but not immediately..

blackgear commented 6 years ago

maybe https://crates.io/crates/inflate and https://crates.io/crates/deflate also helps.

use deflate::deflate_bytes;

let data = b"Some data";
let compressed = deflate_bytes(data);

compress US-en lang 132kb to 20kb……

tapeinosyne commented 4 years ago

Starting with v0.8, embedding all dictionaries should take no more than 2.8MB. Moreover, the feature embed_en-us has been introduced for the common case of embedding American English in e.g. a small utility.

I would still like to find a better solution; ideally, one which allows end-users to select languages individually without a feature explosion.