meilisearch / milli

Search engine library for Meilisearch ⚡️
MIT License
464 stars 81 forks source link

Is there any way to reduce the release executable size? #627

Closed dzcpy closed 2 years ago

dzcpy commented 2 years ago

Hi, thank you for write this awesome tool. I'm currently building a software which requires the package to be within a certain size. However I just tested the example code in the readme, after relase build it takes more than 80M. Besides compressing it using something like UPX, is there anyway to reduce the executable size? Why it take so much space? Here below is the result from running cargo bloat --release --crates using cargo-bloat image

Kerollmops commented 2 years ago

Hey @dzcpy,

You can maybe try using the strip command it could probably help. It removes the debug symbols from the binary.

dzcpy commented 2 years ago

Hi @Kerollmops, thanks for your quick reply. However I'm using the following cargo config which I believe debug symboles have already been removed:

[profile.release]
opt-level = 'z'     # Optimize for size.
lto = true          # Enable Link Time Optimization
codegen-units = 1   # Reduce number of codegen units to increase optimizations.
panic = 'abort'     # Abort on panic
strip = true        # Strip symbols from binary*

I also tried that difference between debug and release modes is only around 10M. Could you shed some light on how to further lower the size of the binary? Thanks

Kerollmops commented 2 years ago

Hey @dzcpy,

Unfortunately, I can't help you more than that as I am not an expert on this subject. however, you can look at this guide repository which will surely help you reduce the binary size furthermore. I will close this issue as this is not something we want to prioritize any time soon.

Have a nice day 🏝

vincent-herlemont commented 2 years ago

I can purpose a solution to reduce the size of ~90% (from ~92.1MiB to ~8.9MiB).

The solution is to ignore tokenizing for these languages "chinese", "hebrew", "japanese", "thai" (charabia/Cargo.toml#L28).

I have purposed a PR here https://github.com/meilisearch/milli/pull/632. This PR seems does not change the current behavior of the crate but allows the possibility to remove the support of the above languages.

vincent-herlemont commented 2 years ago

Just for information ~85% of the size of the binary is taked by crates/lindera-ipadic which used by meilisearch/charabia/Cargo.toml#L24.