prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com
GNU General Public License v3.0
3.47k stars 999 forks source link

validator goes OOM when adding multiple keys via WEB interface #11544

Closed okorolov closed 1 year ago

okorolov commented 2 years ago

🐞 Bug Report

Description

Prysm Validator goes OOM when adding multiple keys via Web interface.

🔬 Minimal Reproduction

Start a validator running on any type of instance with < 32GB RAM. Use WEB UI to load 100 validator keys.

Application / POD will be killed due to OOM event.

🔥 Error

POD fails with OOM event.

🌍 Your Environment

What version of Prysm are you running? (Which release)

3.1.1

Anything else relevant (validator index / public key)?

There seems to be a similar issue that was raised couple of years ago. https://github.com/prysmaticlabs/prysm/issues/5830

It seems that during keys addition process WEB UI will spawn multiple validate processes on the backend in parallel. image

This results in significant spikes on the validator side.

Example: adding 20 keys: (5+ GB spike) image Example: adding 100 keys (25+ GB spike) image

It is worth mentioning that after keys addition POD RAM consumption returns to normal values ~100-500MB RAM after 2-3 minutes.

Suggested Fix

Is it possible to validate 1 key at a time (not doing it in parallel) freeing up memory after each validation? It will take more time that will not result in such huge memory spikes.

nisdas commented 2 years ago

@james-prysm Any ideas on this ?

james-prysm commented 2 years ago

I'll take a look there are some inefficient processes due to api limitations but haven't investigated this.

james-prysm commented 2 years ago

@okorolov as I begin this investigation could you let me know when you experienced this on the UI? was this during wallet creation or adding additional on the dashboard.

okorolov commented 2 years ago

@okorolov as I begin this investigation could you let me know when you experienced this on the UI? was this during wallet creation or adding additional on the dashboard.

This happens during wallet creation process. I will additionally verify if the situation is any different on the already created wallet. Will update. Thanks.

okorolov commented 2 years ago

@james-prysm the situation is the same with already created wallet. Tried importing 10 keys on a small validator instance (2GB RAM) and validator pod failed in 5 seconds.

LAST SEEN TYPE REASON OBJECT MESSAGE 4m13s Warning SystemOOM node/ip-10-10-2-60.us-east-2.compute.internal System OOM encountered, victim process: validator, pid: 29987

image

As well I would like to add that this validation process in UI is a bit confusing: 1) When you type the keystore password - the prysm client will try to decrypt the keys right away as you type (you don't really see this except through the WEB console). If your typing is not fast enough you will get failed validations before you actually finished typing. 2) When the validation process kicks in and you try to press "continue" - there will be no effect until the validation process finishes. This might be especially confusing since validation process for 100 keys can take up to 30+ seconds. and during this time the WEB UI will not react on "continue" button.

Any indication for the validation process and greyed-out continue button might be a good solution for this process.

Thanks.

james-prysm commented 2 years ago

Thanks for checking, I'll try to take a look at this soon as I am now back from devcon

james-prysm commented 2 years ago

adding #237 in the next release, this makes it at the very least 1 validation request instead of 1 for each when the password is the same. hopefully this will solve most usecases. the button staying disabled was another thing I fixed.

james-prysm commented 1 year ago

Item has been released, closing now please request reopen if issue persists in the same way