Reduced size binary - Githubissues

fungs commented 5 months ago

A note for the community

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

The first sentence on the Vector website states "A lightweight, ultra-fast tool for building observability pipelines". When I looked at the vector binary in the different Debian packages, it is about 127 MiB, equivalent to a full Linux distribution image. That's not really lightweight for most people (including myself, of course).

The binary size can be an issue in some situations like

systems with limited memory
systems with limited storage
high storage and bandwidth costs for updates etc.

Attempted Solutions

No response

Proposal

I don't understand why the binary is so bloated, but here are some ideas to get it down to a reasonable size, or at least to make it more plausible

look for redundancy
explain included static binary parts
provide separate binaries for different deployment roles

I just feel bad to augment a container image with the vector binary for doing such a simple thing as forwarding metrics and by doing so, doubling its size.

References

No response

Version

vector 0.36.1 (x86_64-unknown-linux-gnu 2857180 2024-03-11 14:32:52.417737479)

jszwedko commented 5 months ago

Thanks for opening this discussion @fungs !

I agree with you that Vector's current binary size is not what I would guess when thinking of a "lightweight" binary; even Vector's first "official" release (v0.10.0) had a binary of 80 MB. I think that statement was likely comparing against Splunk, FluentD, and Logstash, which are quite a bit heavier. FluentBit might be a better comparison though I note that FluentBit's binary is 50 MB so its not really that far off (I was thinking it'd be an order of magnitude). As another datapoint the OpenTelemetry Collector, even without any contrib modules, is 99 MB. All of these are looking at x86_64 builds. All of these certainly seem pretty heavy-weight for a "sidecar" deployment.

I agree with the list you have to investigate, and would add a couple of things like striping the output, but I think the real savings is likely to come from users only compiling in the modules they need (similar to the OpenTelemetry Collector Contrib model) since each category of dependency (AWS, Kafka, etc.) brings in quite a bit extra that will be extraneous if you aren't using those modules. We do enable that via feature flags, but it's not well documented or easy for users to create their own distributions. It seems to me that it'll be difficult to maintain Vector's binary size over time without that as we add more and more integrations.

Another note is that Vector statically compiles most dependencies (librdkafka, libsasl, etc.) which is probably not helping the overall binary size. This is done for portability reasons.

fungs commented 5 months ago

@jszwedko, that's exactly the way I'm looking at it. I was referring to the x86_64 architecture, but I assume that the picture is similar for others. I was also comparing to fluentbit, shipping an all-in-one binary of 50 MiB, which seems to have a similar stack and purpose.

Looking at the static compilation issue: I'm not a Rust developer, so I don't know how feasible a dynamic loading approach would be to those extra modules. The extra modules could still contain the dependencies as static, but would only be loaded by the program when actually required, and most importantly be omitted from the distribution in many cases. This is how traditional programs work on Linux. It would circumvent custom or role-specific builds.

How others do it: For example, VictoriaMetrics ships both, an all-in-one binary and role-specific agents (for distributed deployments). That kind of partitioning is a balanced approach vs. having to compile individually for every use case.

Naively and technically, I'd think that one could probably build a set of binary artifacts and bind them per individual use case, but I'm not into the whole Rust tool chain.

Cheers

jpds commented 5 months ago

I posted some suggestions on this at: https://github.com/vectordotdev/vector/pull/17342#issuecomment-1932659066

paolobarbolini commented 5 months ago

Our internal build with just a few sinks and stuff is about 20 MB with LTO

bruceg commented 5 months ago

Incidentally, the vdev development tool includes a subcommand that runs Vector with only the feature flags required to run a given config file turned on (vdev run <CONFIG>). It would be pretty straightforward to leverage this to produce a stripped-down bespoke vector binary (via an option to vdev build vector) for a particular use case without having to know the feature flags required.

polarathene commented 5 months ago

I don't have much time atm to engage much in this discussion, but this was a concern for me and I spent a fair amount of time looking into building Vector for minimal size.

I've had a full build of Vector at around 100MB stripped and 20-25MB UPX compressed (adds about 1s to startup time)
Minimal build for what I use at about 26MB stripped and 6.6MB with UPX (around 400ms startup time penalty).
Minimal build with nightly -Z build-std didn't make much difference, but lto = "fat" + codegen-units=1 with panic = "abort" (biggest contributor IIRC) brought that down to 16MB, or 4.7MB UPX compressed (LZMA).

I don't recall fat vs thin LTO making much notable difference in size. I should add that I'm skimming through some old notes for those sizes.

# `Cargo.toml` sets `opt-level = "z"` and `lto = "thin"` (not much value in fat),
RUSTFLAGS="-C strip=symbols -C relocation-model=static" OPENSSL_NO_VENDOR=1 cargo build \
  --release \
  --no-default-features \
  --features "codecs-syslog,sources-stdin,sources-syslog,sinks-console" \
  --bin vector \
  --target x86_64-unknown-linux-musl

The OPENSSL_NO_VENDOR=1 isn't needed if you have the necessary packages to build from source. I had a frustrating time where this wasn't clear as builds were failing with a less helpful error output, turned out I needed the perl package. Opt-out of vendored feature for the openssl crate allowed me to use Alpine openssl-libs-static package, building on Alpine is 2-3x slower due to the memory allocator however.
I didn't see much difference for -gnu builds with static vs dynamic linking. Probably because I didn't have the package available, or perhaps I needed to more explicitly guide the linker? AFAIK with my minimal build the only external dep was openssl though.
Building with the nightly toolchain isn't worth it, often breaks requiring changes to Vector source and not always obvious how to resolve it. I don't recall it providing much notable gains (eg with -Z build-std), especially with the minimal build paired with UPX.

It'd be good to know what features are lightweight vs heavy, as I'd like to include a lightweight version of Vector for users to manage their logs with than the less pleasant logging setup an image I maintain has.

We do enable that via feature flags, but it's not well documented or easy for users to create their own distributions.

I've been meaning to contribute at some point a Dockerfile that's much simpler to adjust for a custom build with all deps, which might be helpful to some users than the more involved process the repo offers (much more complexity there to maintain / grok).

I remember hitting quite a few walls, some of it was unfamiliar, other parts making sense of what the repo build scripts were doing, looking at the Dockerfile files/scripts for releases, just to make sense of what was required to run a more familiar cargo build from a rust:latest / rust:alpine Docker image where there's less moving parts and I could tailor the release profile + cargo build command to my needs.

At the time official Vector release binaries were like 180MB uncompressed 😨

the vdev development tool includes a subcommand that runs Vector with only the feature flags required to run a given config file turned on (vdev run <CONFIG>).

That's pretty cool, cheers 👍

I modified it to output the feature list instead of running cargo build and that worked nicely! 😎

jszwedko commented 5 months ago

👍 You can also try opt-level = "s" to have rustc optimize for size. Thanks for all of those other thoughts! Hopefully they will be useful to readers of this issue.

polarathene commented 5 months ago

You can also try opt-level = "s" to have rustc optimize for size

opt-level = "z" does the same, but which one does better varies based on config IIRC. I had tried various combinations with other profile settings a while back. The Cargo Profile docs hint at that too.

fungs commented 5 months ago

Our internal build with just a few sinks and stuff is about 20 MB with LTO

@paolobarbolini, if you could share a tiny recipe about how you achieved that, it would certainly be helpful for me and others.

paolobarbolini commented 5 months ago

What we did was patch Cargo.toml with

diff --git a/Cargo.toml b/Cargo.toml
index 78cd48b..cccfdf1 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -46,6 +46,9 @@ path = "tests/e2e/mod.rs"
 # compiled via the CI pipeline.
 [profile.release]
 debug = false # Do not include debug symbols in the executable.
+lto = true
+codegen-units = 1
+strip = true

 [profile.bench]
 debug = true

lto is the most important flag here because, if I remember correctly, at least at the time we started making our own builds of vector we noticed that many dependencies were being linked into the final executable despite them not being used by the features we had enabled.

Then we looked at Cargo.toml to see which features were enabled by default and in the build command we did:

cargo build --release --no-default-features --features COMMA_SEPARATED_LIST_OF_FEATURES

For example sinks enables all/most sinks, but you can just cherry-pick the ones you need from here. Same applies to sources and transforms.

Expect the build, especially the linking step at the end, to be very slow.

bruceg commented 5 months ago

To simplify the above, you can automatically get the features by running cargo vdev features CONFIG_FILE.yaml, which will parse the config and extract the features required to run it.

polarathene commented 5 months ago

if you could share a tiny recipe about how you achieved that

Looks like it's the same as what I shared above earlier: https://github.com/vectordotdev/vector/issues/20064#issuecomment-1999036235

Additional tips:

Add opt-level = "z" if performance is adequate and you'd prefer to bias minimal size. opt-level = "s" may sometimes be smaller, it varies.
lto = true will be slow to build, prefer lto = "thin" as that should be fairly similar but much faster AFAIK.
- If the much longer build time is a non-issue, lto = true may have slight perf or size improvements, size benefit becomes marginal with a minimal feature set IIRC.
- For lto = "thin", you'd probably want more codegen units (_default is 16, or withincremental = trueit is 256_). The defaultlto = false` at least cares about codegen units, and setting them to 1 would opt-out of LTO for that mode.
If you can shrug off the panic handler until hitting a problem and switching over to a build with it, you could also use panic = "abort"
Use the RUSTFLAGS setting -C relocation-model=static if you don't need the security benefit for dynamic location of memory allocations (at least that's what I recall this setting for). This can shave off a nice chunk (eg from 29MB to 26MB).
If your deployment environment is ok with compressed executables (AV software may raise a false-positive), you can often reduce the binary to 25% of the size via UPX. This can delay initial startup by 500ms to 1s in my testing, but should be a non-concern for Vector usually.

Expect the build, especially the linking step at the end, to be very slow.

Unfortunately while I was writing up an update to respond to this, my PC crashed and I lost a fair amount of good information :(

Rough recollection (my original write-up was much better formatted/detailed):

incremental = true in Cargo.toml can provide a decent speed up for repeated build steps that aren't LTO related.
- The cargo build cache in the target/ directory is a bit dependent upon mtime attribute however, which a git checkout cannot restore. A similar issue may apply to restoring a remote cache (eg: in Github Actions CI), so I'm not sure how useful this feature is for build machines. Github Actions does have self-hosted runners which could retain the cache on your runner machine if that's an option.
- On my small feature set build, this raised the target dir size from 1.5GB to 2GB, but no impact on binary release file size which is a win. Provided you have an explicit codegen-units = 16 (to match the implicit default), otherwise that setting is implicitly much larger and will produce larger binary builds.
.cargo/config.toml / RUSTFLAGS env to configure -C linker-plugin-lto (often used for cross-language LTO).
- I originally shared an example config and documented this quite well with some additional insights, if someone is interested I can try write up something similar again.
- The main perk for this with linker-plugin-lto was the ability to add an LTO cache directory, avoiding any redundant slow down for LTO when that work had already been done.
- The configured linker (-C link-arg=-fuse-ld=mold for mold) affects the appropriate setting names since those vary by linker.
  - mold and lld are compatible IIRC, while ld is your default otherwise and has settings named differently.
  - Unlike without linker-plugin-lto these also impact the binary size as they are more involved in the LTO process, lld often was the best reduction with mold then ld.
- lto setting could be "off" / false or "thin" and it would always be thin LTO, or you could do "fat" / true for full LTO.
- With linker-plugin-lto enabled, the non-fat LTO setting in Cargo.toml produces the equivalent binary size regardless of the 3 choices, normally those affect the binary size differently as they either disable LTO ("off" or under certain conditions false), or perform thin LTO at a different scope (false vs "thin"),
- When using -C linker=clang, the default LTO jobs is implicitly 0 which maps to 1 thread per physical core, thus only 50% CPU for -C linker-plugin-lto. This should be set to all (all threads) to match what Rust normally does for thin LTO, otherwise slight slow down in build time from reduced CPU usage.
- You'll also need to set -C link-arg=-flto. While you can set -flto=thin / -flto=full, I think this only matters for non-rust code as it has no effect on the LTO job threads I observed when monitoring. The lto setting in Cargo.toml determines if it'll be thin or full LTO. Still this arg is required, at least when specifying the mold linker.

vectordotdev / vector

Reduced size binary #20064

A note for the community

Use Cases

Attempted Solutions

Proposal

References

Version