lf-edge / eden

Eden is where EVE and Adam get tried and tested:
https://projecteve.dev
Apache License 2.0
50 stars 48 forks source link

Add upload certificate command to adam #927

Closed europaul closed 8 months ago

europaul commented 11 months ago

A new CLI option is added: on eden adam upload-cert the provided certificate is uploaded to /admin/certs via an HTTP POST

NOTE: the new EDEN test needs to be run with an EVE version that includes this fix and with the Adam version that includes this change.

Closes #896

giggsoff commented 11 months ago

It looks like the last part of the puzzle before moving to tests is to have some function, that will re-sign all config objects derived from the signing certificate here, at least modify ControllerCertHash and ContextId, of CipherContexts correct?

europaul commented 11 months ago

@giggsoff wouldn't it be enough just to change Eden's path to signing certificate to the new certificate, so it only affects the newly issued expectation objects?

Also just fyi - rn I'm not planning to change the signing key, just the certificate.

giggsoff commented 11 months ago

Well, I am not sure will EVE-OS work fine with old certificate in CipherContexts and new certificate in Adam (attestation). I think we need some action to re-calculate old CipherContexts as we will potentially have a test with deploying of some app and check how it works after modification of certificate.

I see, but ControllerCertHash in CipherContext is about cert, not key.

europaul commented 11 months ago

@giggsoff I tried the following:

  1. start eden and eve
  2. eden pod deploy -n nginx1 docker://nginx
  3. generate a new signing certificate from the same signing key (every parameter is the same except the validity period)
  4. eden adam upload-cert --cert-file ~/.eden/certs/signing-new.pem
  5. eden pod deploy -n nginx2 docker://nginx
  6. check the logs and eden pod ps - both applications are running fine and I didn't see anything suspicious

I kept the old signing cert and key here. Tbh I don't fully understand yet how the Expectation structures and package is used. My understanding now is that they are used to build the config that is sent to Adam to be later relayed to EVE. But that would mean that in this case we are signing the structures with the old key, while we sign the message (envelope) with the new key. This likely doesn't make the difference since the key stays the same in my scenario - only the certificate changes.

Please tell me if I'm missing something.

giggsoff commented 11 months ago

Can you please check apps with any userdata (e.g. you made test with https://github.com/lf-edge/eden/pull/918)? Inside of expect we try to encrypt two types of data: credentials of datastores and userdata. Also it would be great to check the restart of EVE-OS (and re-attestatation process).

europaul commented 11 months ago

@giggsoff I went on a roller-coaster ride through the code and found out that

we don't use the signing certificate to generate the symmetric key for the encryption, we only use the signing key BUT we use the hash of the signing certificate to generate IV for AES CFB, so this can mess the decryption up BUT we also send that IV with the cipher text in the cipher block, so decrypting should actually not be a problem...

I'm not yet sure where EVE gets cipher context from (does it come with every message or independently of the messages), but I will check. I will also do the tests that you suggested :+1:

But also I just thought maybe it's a good idea not to re-sign the old configs after changing the signing certificate - if EVE cannot deal with that this might just show a flaw in our design, since we the cloud might change the certificates frequently to ensure forward secrecy and we cannot expect it to re-sign all configs since the beginning of time. What do you think?

uncleDecart commented 11 months ago

But also I just thought maybe it's a good idea not to re-sign the old configs after changing the signing certificate - if EVE cannot deal with that this might just show a flaw in our design, since we the cloud might change the certificates frequently to ensure forward secrecy and we cannot expect it to re-sign all configs since the beginning of time. What do you think?

@europaul but that means if I have any previous certificate I can use it to sign configuration and send it, is it not? So potentially if I get any old certificate I can use it to change configuration, which sounds not secure. Or am I missing something?

europaul commented 11 months ago

But also I just thought maybe it's a good idea not to re-sign the old configs after changing the signing certificate - if EVE cannot deal with that this might just show a flaw in our design, since we the cloud might change the certificates frequently to ensure forward secrecy and we cannot expect it to re-sign all configs since the beginning of time. What do you think?

@europaul but that means if I have any previous certificate I can use it to sign configuration and send it, is it not? So potentially if I get any old certificate I can use it to change configuration, which sounds not secure. Or am I missing something?

@uncleDecart you'll need the controllers (old) private key to sign the configuration, the certificate is meant to be public and available to everybody - it's just our choice if we choose to remember the old certificates in EVE or not. Or are you talking about the case when controller's private key has been compromised? In this case EVE would need some signal from the controller that the corresponding certificate should not be trusted anymore, e.g. a Certificate Revocation List (CRL). Idk however if EVE supports CRLs...

europaul commented 11 months ago

@giggsoff you were right, I re-did the same test with applications with userdata and reboot like you suggested - after reboot EVE cannot decipher the userdata from the first application, because it is missing the old certificate that is required in the old cipherContext. The reason is that EVE only stores the latest signing certificate from the controller in persistent storage.

milan-zededa commented 9 months ago

@europaul LGTM but lets wait for the adam side to be merged: https://github.com/lf-edge/adam/pull/120 Then you will be able to update the adam version and put your test into the test suite. How long does the test take to execute? If it is fairly quick, we can put into the "Smoke tests" suite.

europaul commented 9 months ago

@europaul LGTM but lets wait for the adam side to be merged: lf-edge/adam#120 Then you will be able to update the adam version and put your test into the test suite. How long does the test take to execute? If it is fairly quick, we can put into the "Smoke tests" suite.

@milan-zededa on my system the test takes around 4 mins to complete

europaul commented 9 months ago

Adam used to get it's certificates and keys through envs. To unify the UX I switched this mechanism to getting paths to files with certificates and keys through the CLI parameters only. This way it doesn't require a POST API anymore.

This PR needs https://github.com/lf-edge/adam/pull/123 to work.

Tell me how you like this solution!

P.S.: I will fix the commits later once we agree on the code.

deitch commented 9 months ago

Adam used to get it's certificates and keys through envs. To unify the UX I switched this mechanism to getting paths to files with certificates and keys through the CLI parameters only. This way it doesn't require a POST API anymore.

To be clear, those path-based CLI flags already existed in Adam, correct? You are just taking advantage of them?

And Adam already reads those files with each request, rather than cache them in memory, so there is no need to tell Adam to reload the files, correct?

This PR needs https://github.com/lf-edge/adam/pull/123 to work.

Does it? I think https://github.com/lf-edge/adam/pull/123 just gets rid of something that was confusing there. But this should not be blocked by it (unless I misunderstood)?

Tell me how you like this solution!

Like!

europaul commented 9 months ago

To be clear, those path-based CLI flags already existed in Adam, correct? You are just taking advantage of them?

Does it? I think https://github.com/lf-edge/adam/pull/123 just gets rid of something that was confusing there. But this should not be blocked by it (unless I misunderstood)?

yes, the flags are already part of Adam, so no changes in Adam are required. The referenced commit is more of a clean-up.

And Adam already reads those files with each request, rather than cache them in memory, so there is no need to tell Adam to reload the files, correct?

yes, that is the case for the signing certificate which is relevant for this test

europaul commented 9 months ago

added the test to the smoke test suite. ready to merge!

uncleDecart commented 9 months ago

once https://github.com/lf-edge/adam/pull/120 is merged and we pull latest adam we can merge this

europaul commented 9 months ago

@giggsoff @uncleDecart thanks for the great comments!

once https://github.com/lf-edge/adam/pull/120 is merged and we pull latest adam we can merge this

https://github.com/lf-edge/adam/pull/120 was superseded by https://github.com/lf-edge/adam/pull/123, so I'm gonna close it. However the current PR doesn't require any changes in Adam and can be merged independently

deitch commented 9 months ago

Some yetus complains, and some eden tests failing. Are those failures because of this, because of the other changes, or transient?

europaul commented 9 months ago

Some yetus complains, and some eden tests failing. Are those failures because of this, because of the other changes, or transient?

I'd say they fail, because Eden workflows still run with an older EVE version. This PR is here to fix it

And the yetus complains are in accordance to what we discussed above.

uncleDecart commented 9 months ago

@europaul can you rebase to master so that we can rerun tests?

uncleDecart commented 8 months ago

There's a failure on rebooting device on ZFS filesystem. Problem is persistent and also occurred before, need to investigate zfs failure before merging this

europaul commented 8 months ago

There's a failure on rebooting device on ZFS filesystem. Problem is persistent and also occurred before, need to investigate zfs failure before merging this

I removed the test for the app being in HALTED state after the reboot, because it's is sometimes skipped and the app goes from HALTING to INITIAL. This fixed the tests. We still check for the app to be RUNNING after the reboot.

@uncleDecart this PR is ready to be merged