Open michaelsproul opened 11 months ago
Upon testing, client_identity_path
and client_identity_password
are having the same bug, i.e., if the first import via HTTP API fails, a second try will show DuplicatePublicKey
.
The HTTP API works well if the first import is successful and it gets added to the validator_definitions.yml
file as expected.
I would love to take this up if some help is needed :)
@v4lproik This issue is quite open-ended and would require a bit of exploration of the design space to solve fully, I think. My current thinking is that we should adopt some approach like software transaction memory or copy-on-write to make our implementation more obviously correct. This would require overhauling some of the datastructures to make them cheaper to copy (persistent data structures).
Then the transaction process would be something like:
I'm a big fan of CoW though, so I'm a bit biased. It would also be possible to solve this issue by just handling the specific cases highlighted above better, e.g. by doing all operations that can fail (e.g. reading from disk) prior to updating anything permanently in-memory.
Another issue that's tangentially related (and might make testing these code paths more straight-forward) is https://github.com/sigp/lighthouse/issues/4854. That could be a good one to start with to dip your toes in the VC API, perhaps with an eye to solving this issue too.
Sounds good! Thank you very much for the clear explanation! I will get on with #4854 instead!
Description
Found by @chong-he:
If the certificate for a web3signer validator is invalid while adding a validator using
/lighthouse/validators/web3signer
then the validator will get added in-memory but will not be activated and will not be persisted in the validator definitions on disk. Attempting to re-add the same key with correct values then fails with a duplicate key error :skull:E.g.
Start the VC:
Create a file named
request.json
:Make the request once (ensuring
certificate.pem
does not exist or is invalid):Make the request again:
We shouldn't get duplicate validator here, because the validator didn't get added successfully. The VC still logs that no validators are active:
However we see that the validator is stored in memory, which is why we get the duplicate error:
Version
Lighthouse v4.5.0
Steps to resolve
Harden the atomicity of adding validators. We could use copy-on-write followed by compare-and-set (my favourite), or something like adding the validator initially disabled, then switching it to enabled once everything has succeeded. Or we could just apply stricter pre-validation before we go mutating anything (although this could still cause a failure if a disk op fails, etc).