Make sure that it's hard to lose a node keys

volovyks commented 8 months ago

We must explore best practices for saving node keys
We must provide our partners with instructions on how they need to set their env to make it hard to lose the keys

ppca commented 8 months ago

Did some research into this. In short, given our first batch customers will also use GCP, I'd recommend our partners to follow similar approach as us, it should protect their keys unless gcp account stolen or gcp goes thru huge security hack.

What we do now, is we save all the private keys(cipher sk, account sk, aws access key, aws secret key, and sk shares) in google secret manager, and when we start an instance, these secret values are passed in as Env variables. We store the cipher pk on the machine that we use to run the terraform apply command. This approach saves no private keys locally on any machine, not on the machine running terraform, not on the signing nodes either.

So if the keys were to be stolen, it can be stolen in the following ways:

someone hacks the gcp account and gets all the keys.
the key is somehow stole the keys while keys are being transited from secret manager to the container.
someone hijacks the signing node, they can sign a partial signature, but they don't have access to all other keys.
key stolen from gcp machines storing the keys

For 1: Suppose we are t-out-of-n, as long as less than n-t nodes got their gcp account hacked, the multichain signing system won't be compromised. We will need to make sure we can kick the hacked nodes. For 2: API calls to secret manager are all authenticated and go thru a secure HTTPS connection. For 3: We need to make sure we can kick that hijacked node. And we can create a new set of keys to start a new node, which then goes thru the process of started -> joining -> resharing-> running. For 4: secrets are always encrypted before persisting to disk in secret manager.

The keys won't be lost unless user lost their gcp account access. But they would be able to get it back typically.

The option out there I see that provide greater security and richer features is Hashicorp Vault. It has more involved encryption, and supports key rotation, especially a feature called dynamic secrets. Hashicorp Vault is also better as a universal solution interfacing all cloud provided instances. But I don't see us needing it soon because:

we are not concerned about key rotation at the moment. Hashicorp Vault does an automatic rotation in dynamic secrets, which I think is not what we'd want, at least for the sk key shares. Secret manager allows to reversion quite easily as well, so we have options if we want to do key rotation.
secret manager's encryption should be enough for us now.
we are ok with first batch of partners onboarding on GCP

If there ever comes a day when we have a lot of partners and we need to scale to different cloud providers and step up on security, we could always use the enterprise Hashicorp Vault (we'd need to pay for the cost) and add the option to get secrets from that in our code. Our partners could also easily switch to Hashicorp Vault if they want to.

reference: https://scalesec.com/blog/a-comparison-of-secrets-managers-for-gcp/

volovyks commented 8 months ago

I would add that for "1" we should reshare regularly (we have an issue with that)
Also, I think that the biggest risk here is the human factor. Yes, they should protect their GCP accounts, they should also limit access to GSM to a limited number of people. Most of the developers should not have the access.
For "3", if the node is under control again, resharing should help. "Kick" mechanism is good, but we need to design it from the ground-up and include all other requirements.
Can you elaborate on 4? Where we will store the encryption key?
Agree about HashiCorp

ppca commented 8 months ago

Can you elaborate on 4? Where we will store the encryption key? So by 4 I mean google will encrypt the secrets for us when it persists them to google's disks, we don't need to store that encryption key ourselves. It is possible to have customer managed encryption keys if we want to, but I haven't looked into how that works yet.
I would add that for "1" we should reshare regularly (we have an issue with that) can you explain why resharing will help? if the gcp account is hacked then hacker can start a node with those keys if they want to, and we will just be resharing with them.
Also, I think that the biggest risk here is the human factor. Yes, they should protect their GCP accounts, they should also limit access to GSM to a limited number of people. Most of the developers should not have the access.--- agree +10000
For "3", if the node is under control again, resharing should help. "Kick" mechanism is good, but we need to design it from the ground-up and include all other requirements. the hijacked node will also be involved in resharing and will get a new keyshare too, so that won't help?

volovyks commented 8 months ago

Ok, make sense.
You are right, if somebody steals all the keys they can participate in resharing. My suggestion works only if they do not do that for a day or two. For example, because they want to steal keys from other nodes.
In your initial message you said "But they don't have access to all other keys". It means that they will not be able to participate in the resharing process. But yeah, in reality, a bad actor will steal all the keys.

I realized that we could use one more strategy to increase security. In our protocol, we use an encryption key. It means that the bad actor must still both of them to participate in the protocol in the current epoch. If we separate ownership of these two keys - we will significantly improve security. The attacker will need to get access to 2 Google accounts. We can even ask to give access to the NEAR key to a third person. There is a GCP admin, who controls everything, but that is inevitable.

In general, let's summarize this discussion into a doc and close this issue.

near / mpc

Make sure that it's hard to lose a node keys #435