sigp / lighthouse

Ethereum consensus client in Rust
https://lighthouse.sigmaprime.io/
Apache License 2.0
2.9k stars 737 forks source link

Reduce binary size by compressing `genesis.ssz` #4564

Open paulhauner opened 1 year ago

paulhauner commented 1 year ago

Description

We presently include the uncompressed genesis state for supported networks in our binary. We have several of these files:

[3.3M]  common/eth2_network_config/built_in_network_configs/chiado/genesis.ssz
[3.1M]  common/eth2_network_config/built_in_network_configs/gnosis/genesis.ssz
[5.2M]  common/eth2_network_config/built_in_network_configs/mainnet/genesis.ssz
[ 28M]  common/eth2_network_config/built_in_network_configs/prater/genesis.ssz
[ 15M]  common/eth2_network_config/built_in_network_configs/ropsten/genesis.ssz
[2.8M]  common/eth2_network_config/built_in_network_configs/sepolia/genesis.ssz

The total of these files is 57.4M, which goes straight to our hips binary size (presently ~110M). I suspect we could significantly reduce the size of the binaries by storing compressed genesis.ssz bytes in the binary and then decompressing on-demand (i.e. at startup).

I propose that we use snappy compression, since it's used by the P2P layer and therefore available in the binary.

Before committing to this change, I would be keen to know the time it takes to decompress the state at startup. Perhaps getting numbers for mainnet and Prater would be good. We want to be careful not to slow-down BN/VC startup.

Details

The method for including the genesis.ssz can be a bit tricky to understand because it's written in macros. I think this should be fairly straight-forward once you get your head across it. I've included some links below to give a lay of the land.

The bytes are added to the binary here:

https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/common/eth2_config/src/lib.rs#L129-L137

The genesis.ssz file used by the include_bytes! macro is generated here (this is where we'd want to do the snappy compression):

https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/common/eth2_network_config/build.rs#L30-L47

The application accesses the included bytes here (this is where we'd want to do the snappy decompression) (there might also be other places it is accessed):

https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/common/eth2_network_config/src/lib.rs#L98-L108

paulhauner commented 1 year ago

Credit to @dapplion for suggesting this in a DM.

eserilev commented 1 year ago

I'd like to work on this. I can start by benchmarking the time it takes to decompress the mainnet/prater genesis state and post the results here

eserilev commented 1 year ago

I have a repo here: https://github.com/eserilev/snappy-genesis-benchmark that compresses/decompresses genesis.ssz files for mainnet and prater. I left some notes in the README. To summarize:

On my machine the time it took to decompress genesis.ssz:

mainnet: 1.5s
prater: 9.8s

file sizes for the compressed and decompressed genesis.ssz

decompressed mainnet: 5.4M
compressed mainnet: 1.8M

decompressed prater: 29.8M
compressed prater: 18.1M

Snappy compression seems to reduce file size by ~50%, while increasing start-up time by potentially 10s of seconds.

I measured elapsed time using std::time::Instant::now(), which I think should be sufficient. We could do more elaborate benchmarking, but I think thats probably overkill

~I think adding 10s of seconds to BN/VC start up time is a fair trade off~ for reducing ~25M in binary size. What do you think?

EDIT: using the release flag when running compression/decompression resulted in far faster times (in the millisecond range)

paulhauner commented 1 year ago

Very interesting @eserilev, thanks!

I'm tempted to go ahead with this. The 1.5s mainnet delay seems reasonable for mainnet. The ~10s delay for a Prater node is a bit heavy, but perhaps not a big deal considering it's a testnet.

I'll raise this with some others before making a call. Thanks again!

paulhauner commented 1 year ago

Thinking about this some more, I think there's a few options:

  1. Don't compress any states (the status quo).
  2. Compress all states.
  3. Only compress some states.

We could probably achieve (3) by just detecting the presence of a genesis.ssz.snappy file on the filesystem.

I'm tempted to go with (3) since I'm not really sure that shrinking the binary by ~3.6MB (~3%) is worth adding a 1-2s startup delay to the VC for mainnet. Reducing VC startup delays is good because it reduces the downtime penalty for upgrades; I like users to feel uninhibited to update regularly.

On the other hand, I can see the value in a 10-20MB (~10-20%) reduction by compressing testnet binaries. The startup delay is much less of a concern there.

I'm presently in favour of (3), but I'll raise this internally to get some feedback.

michaelsproul commented 1 year ago

I get very different results on my machine, which makes me wonder if @eserilev's disk is severely limiting his benchmark:

Time elapsed in compress_genesis_mainnet() is: 7.351209ms
Time elapsed in compress_genesis_prater() is: 35.996708ms
Time elapsed in decompress_genesis_mainnet() is: 4.52925ms
Time elapsed in decompress_genesis_prater() is: 29.038791ms

This is on an M1 Macbook Pro (2021).

paulhauner commented 1 year ago

After some more research, I've come to the following conclusions:

If my first two points turn out to be correct (this is something that would be determined during implementation), then I am fine to just compress all states. Especially, if Michael's timings turn out to be closer to reality for most users.

On another note, I've noticed that Eth2NetworkConfig::beacon_state isn't the single, canonical place where we access the genesis_state_bytes. Rather, those bytes tend to be access directly and passed around the application. I'd be tempted to create a new-type wrapper around those (now compressed) bytes which provides functions for compression/decompression. That's up to the implementer, though ☺️

eserilev commented 1 year ago

Thanks for taking another look at this Michael, glad to hear its running faster on other machines. I'm on a relatively beefy 2021 M1 max, so I wonder what could be limiting my compression/decompression times this drastically.

Thanks for the additional write up Paul, I think I have a good starting point to begin working here.

paulhauner commented 1 year ago

I wonder what could be limiting my compression/decompression times this drastically

There's a "lower power mode" (you can Spotlight search that phrase) which can reduce compute speeds. I'd be surprised if it were to make that much of a difference though..

michaelsproul commented 1 year ago

@eserilev Did you run the benchmark with release optimisations? Like cargo run --release?

eserilev commented 1 year ago

@eserilev Did you run the benchmark with release optimisations? Like cargo run --release?

Ah! that was the issue. With the release flag these are my results:

Time elapsed in compress_genesis_mainnet() is: 5.509875ms
Time elapsed in compress_genesis_prater() is: 30.911709ms
Time elapsed in decompress_genesis_mainnet() is: 4.2995ms
Time elapsed in decompress_genesis_prater() is: 20.129416ms
pk910 commented 1 year ago

Heya guys,

I really like the idea of compressing the genesis states. Did you already think about how to proceed with the holesky genesis?

The genesis state for holesky will be >190MB uncompressed. Even with the compression that doesn't sound like it can be packed into the executable. So given that, it might be reasonable to not pack testnet states into the executable at all, but load them from an external webserver/github/whatever?

paulhauner commented 1 year ago

it might be reasonable to not pack testnet states into the executable at all, but load them from an external webserver/github/whatever?

We used to pull genesis states from Github, however we had users having trouble accessing Github (IIRC it was primarily users in China). That's why we started including states in the binary.

I haven't done the numbers on Holesky, but if it will be >190MB uncompressed then we might need to consider going back to downloading genesis states at startup. To address the issues with Github access, we could:

With that approach we could instruct users to supply an alternate --genesis-state-url if the default approach is unreliable.

eserilev commented 1 year ago

would we be hosting the compressed genesis files ? it could reduce download times at start up compared to downloading uncompressed genesis

dapplion commented 1 year ago

This can be a nice initiative to extend checkpointz, should not be too difficult since that infra can already serve states, they just need to expose another one

paulhauner commented 1 year ago

would we be hosting the compressed genesis files ?

Yep, that sounds like a good idea to me!

paulhauner commented 1 year ago

FYI we're expecting to release v4.4.0 on/around the 31st of August. The primary goal of v4.4.0 is to add support for --network holesky. The Holesky genesis state doesn't exist yet, but I expect to see it at any time now.

I think that this issue (state compression and downloading) is going to be critical for that release. @eserilev I'm happy for you to take this issue if you'd like to (you've done lots of great work for LH), but I'd like to give you the option to pass if you're not comfortable with the time pressure. Could you please let me know if you'd still like to tackle this issue? No pressure either way ☺️

eserilev commented 1 year ago

@paulhauner no problem, ill get a PR up for review shortly

barnabasbusa commented 1 year ago

Genesis state now exists and it is 198MB.

paulhauner commented 1 year ago

I'm pushing this to the next release since we have #4653 which adds Holesky.