rancher / os

Tiny Linux distro that runs the entire OS as Docker containers
https://rancher.com/docs/os/v1.x/en/
Apache License 2.0
6.44k stars 660 forks source link

Renewing ros ca-certificates #3062

Closed ibox4real closed 2 years ago

ibox4real commented 2 years ago

RancherOS Version: (ros os version) v1.5.4, v1.5.6, v1.5.7

Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.) AWS

Yesterday Lets Encrypts DST Root CA X3 expired. Because of that docker pull my-custom-regsitry now gives x509: certificate has expired or is not yet valid.

How do i force renew the ca-certificates on ros?

stove-panini commented 2 years ago

We just ran into this yesterday.

  1. Grab the R3 cert here: https://letsencrypt.org/certs/lets-encrypt-r3.pem
  2. Copy it to /etc/docker/certs.d/your.registry.com:4567/ca.crt

It will get picked up automatically with no need to restart any services.

I'm trying to find a way to get this into User Data for new nodes but I can't see a way to do that in Rancher's Node Templates.

ibox4real commented 2 years ago

We just ran into this yesterday.

  1. Grab the R3 cert here: https://letsencrypt.org/certs/lets-encrypt-r3.pem
  2. Copy it to /etc/docker/certs.d/your.registry.com:4567/ca.crt

It will get picked up automatically with no need to restart any services.

I'm trying to find a way to get this into User Data for new nodes but I can't see a way to do that in Rancher's Node Templates.

Thank you for the response! I figured the manual cert upload thing would work as a quick solution but hoped for an easier solution.

For now i just decided to drop ros alltogether and use amazon linux with docker instead.

If it is any help, i used this cloud-config in the user data with ros. Maybe you can customize it to add the cert, but i believe the user data is executed only on first boot.

 #cloud-config
write_files:
  - path: /etc/rc.local
    permissions: "0755"
    owner: root
    content: |
      #!/bin/bash
      wait-for-docker
      sudo docker run rancher-agent ...
nathansamson commented 2 years ago

Also seems to be a problem with latest Rancher OS (1.5.8) on Digitalocean

ygirerd commented 2 years ago

Same issue with RancherOS 1.5.8, I confirm. It does not include root CA used by Let's Encrypt.

jrruethe commented 2 years ago

Its a little obscure, but here is what I ended up doing. I built a docker image from debian:stable-slim with ca-certificates installed (I called this image update_certificates locally). Then, I ran this image with the system-docker, mounting the underlying ca-certificates file and copying over the new certs. It solved the immediate issue for me.

system-docker run -it --rm -v /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt.rancher --entrypoint /bin/bash update_certificates:latest -c "cp /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt.rancher"

After this, I had to restart the user docker for it to pick up the changes:

system-docker restart docker
cedvan commented 2 years ago

Tips

See "Set Custom Certs in ISO" section from https://rancher.com/docs/os/v1.x/en/configuration/airgap-configuration to update iso and add new lets encrypt ca.cert

$ git clone https://github.com/rancher/os.git
$ cd os
$ make shell-bind
$ cd scripts/tools/
$ wget https://github.com/rancher/os/releases/download/v1.5.8/rancheros-proxmoxve.iso
$ https://letsencrypt.org/certs/lets-encrypt-r3.pem
$ ./flush_crt_iso.sh --iso rancheros-proxmoxve.iso --cert lets-encrypt-r3.pem
$ exit
$ ls ./build/

Updated rancheros-proxmoxve.iso work for me ;)

krumware commented 2 years ago

Also confirming here that the latest AMIs do not have the certificate updates (1.5.8)

cjellick commented 2 years ago

we're going to look into cutting a release to address this, but itll likely be the last rancherOS release

krumware commented 2 years ago

Thank y'all for doing this! It's starting to propagate through and affect our environments. This should buy us some time to move to a new image.

(Side note, is there a recommendation? k3os does not have a published AMI)

PrplHaz4 commented 2 years ago

(Side note, is there a recommendation? k3os does not have a published AMI)

If docker is a requirement, check out burmillaOS. It’s a community driven fork or rancherOS.

krumware commented 2 years ago

@PrplHaz4 I need a public AMI in the short term, but it looks like burmillaOS doesn't have any

PrplHaz4 commented 2 years ago

@PrplHaz4 I need a public AMI in the short term, but it looks like burmillaOS doesn't have any

Looks like the answer for now is build it yourself using ROS tooling?

https://github.com/burmilla/os/issues/114 https://github.com/burmilla/os/issues/55#issuecomment-770342781

dweomer commented 2 years ago

v1.5.8 seems to work without issue. is upgrading not an option?

This release addressed CVE-2021-21284 and CVE-2021-21285. Please consider upgrading.

dweomer commented 2 years ago

Same issue with RancherOS 1.5.8, I confirm. It does not include root CA used by Let's Encrypt.

I spun up a v1.5.8 locally (vmware) and in us-west-2 via ami-0cdefa6a0646eb511 without issue. Is this a problem on systems provisioned before the LE change-over?

cjellick commented 2 years ago

Ok, if we have a release that is working, i don't think we'll ship a fix

krumware commented 2 years ago

@dweomer can you please try to pull this image?

[rancher@ip-172-31-65-29 ~]$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[rancher@ip-172-31-65-29 ~]$ docker pull cr.l5d.io/linkerd/controller:stable-2.10.2
Error response from daemon: Get https://cr.l5d.io/v2/: x509: certificate has expired or is not yet valid

That is from a brand new instance in us-east-1 using ami-02fe87f853d560d52

dweomer commented 2 years ago

@dweomer can you please try to pull this image?

[rancher@ip-172-31-65-29 ~]$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[rancher@ip-172-31-65-29 ~]$ docker pull cr.l5d.io/linkerd/controller:stable-2.10.2
Error response from daemon: Get https://cr.l5d.io/v2/: x509: certificate has expired or is not yet valid

That is from a brand new instance in us-east-1 using ami-02fe87f853d560d52

Ah, I see:

$ [rancher@ip-172-30-0-170 ~]$ docker pull cr.l5d.io/linkerd/controller:stable-2.10.2
Error response from daemon: Get https://cr.l5d.io/v2/: x509: certificate has expired or is not yet valid
---
rancher@ip-172-30-0-170:~$ openssl s_client -host cr.l5d.io -port 443 -showcerts | head
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = cr.l5d.io
verify return:1
CONNECTED(00000005)
---
Certificate chain
 0 s:CN = cr.l5d.io
   i:C = US, O = Let's Encrypt, CN = R3
-----BEGIN CERTIFICATE-----
MIIFFjCCA/6gAwIBAgISBNT3TDnW3DtlRhHstDS4iJ8rMA0GCSqGSIb3DQEBCwUA
MDIxCzAJBgNVBAYTAlVTMRYwFAYDVQQKEw1MZXQncyBFbmNyeXB0MQswCQYDVQQD
EwJSMzAeFw0yMTA5MDkxNDM4NDNaFw0yMTEyMDgxNDM4NDJaMBQxEjAQBgNVBAMT
CWNyLmw1ZC5pbzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAJ6B6DLQ
^C
rancher@ip-172-30-0-170:~$ 
---
rancher@ip-172-30-0-170:~$ openssl s_client -host docker.io -port 443 -showcerts | head
depth=2 C = US, O = Amazon, CN = Amazon Root CA 1
verify return:1
depth=1 C = US, O = Amazon, OU = Server CA 1B, CN = Amazon
verify return:1
depth=0 CN = *.docker.io
verify return:1
CONNECTED(00000005)
---
Certificate chain
 0 s:CN = *.docker.io
   i:C = US, O = Amazon, OU = Server CA 1B, CN = Amazon
-----BEGIN CERTIFICATE-----
MIIF1TCCBL2gAwIBAgIQC4w2dqryzV0OctnMuh4j9TANBgkqhkiG9w0BAQsFADBG
MQswCQYDVQQGEwJVUzEPMA0GA1UEChMGQW1hem9uMRUwEwYDVQQLEwxTZXJ2ZXIg
Q0EgMUIxDzANBgNVBAMTBkFtYXpvbjAeFw0yMTA0MjUwMDAwMDBaFw0yMjA1MjQy
MzU5NTlaMBYxFDASBgNVBAMMCyouZG9ja2VyLmlvMIIBIjANBgkqhkiG9w0BAQEF
^C
rancher@ip-172-30-0-170:~$ 
dweomer commented 2 years ago

Here's how I fixed it:

# switch to the ubuntu console for cli goodness
sudo ros console switch ubuntu # re-log

# download the new item for the ole trust store
wget -P /usr/local/share/ca-certificates https://letsencrypt.org/certs/lets-encrypt-r3.pem

# update your trust store
sudo update-ca-certificates

# restart the docker daemon
sudo system-docker restart docker

... and then:

$ docker pull cr.l5d.io/linkerd/controller:stable-2.10.2
stable-2.10.2: Pulling from linkerd/controller
4fdf73345ef8: Pull complete 
b86807aa1558: Pull complete 
9a5fac4c9cde: Pull complete 
83e7ecc3331a: Pull complete 
Digest: sha256:4508ffd137e9fa6adf2b8ad9771bfa3ff7a4ac09c1110545fafb9f2382c3f1e6
Status: Downloaded newer image for cr.l5d.io/linkerd/controller:stable-2.10.2
cr.l5d.io/linkerd/controller:stable-2.10.2

Just rebooted. Seems to have stuck.

dweomer commented 2 years ago

Here's how I fixed it:

# switch to the ubuntu console for cli goodness
sudo ros console switch ubuntu # re-log

# download the new item for the ole trust store
wget -P /usr/local/share/ca-certificates https://letsencrypt.org/certs/lets-encrypt-r3.pem

# update your trust store
sudo update-ca-certificates

# restart the docker daemon
sudo system-docker restart docker

... and then:

$ docker pull cr.l5d.io/linkerd/controller:stable-2.10.2
stable-2.10.2: Pulling from linkerd/controller
4fdf73345ef8: Pull complete 
b86807aa1558: Pull complete 
9a5fac4c9cde: Pull complete 
83e7ecc3331a: Pull complete 
Digest: sha256:4508ffd137e9fa6adf2b8ad9771bfa3ff7a4ac09c1110545fafb9f2382c3f1e6
Status: Downloaded newer image for cr.l5d.io/linkerd/controller:stable-2.10.2
cr.l5d.io/linkerd/controller:stable-2.10.2

Just rebooted. Seems to have stuck.

Switching consoles works as well, so long as you don't switch back to default

krumware commented 2 years ago

I'm going to add those steps to some of my workloads that run, but that doesn't really help with new ec2 nodes created via rancher. Any chance of still justifying one last image update @cjellick?

dweomer commented 2 years ago

I'm going to add those steps to some of my workloads that run, but that doesn't really help with new ec2 nodes created via rancher. Any chance of still justifying one last image update @cjellick?

I am pretty sure that Rancher2 allows you to specify userdata for machine provisioning.

krumware commented 2 years ago

It does, but only if you edit the node template through the Rancher API. It is unclear if the cloud-init needs to include the original config, or if the config additions are appended. (this is poorly documented or hard to find)

dweomer commented 2 years ago

It does, but only if you edit the node template through the Rancher API. It is unclear if the cloud-init needs to include the original config, or if the config additions are appended. (this is poorly documented or hard to find)

ugh, sorry.

krumware commented 2 years ago

I hate to be a pest, just bumping this to see if there is a definitive yes/no on a final release

dweomer commented 2 years ago

I hate to be a pest, just bumping this to see if there is a definitive yes/no on a final release

we will not be doing a release to solve for an expired trust store entry. based on https://github.com/rancher/os/issues/3062#issuecomment-949015414, this can be solved for at provision-time for new systems and at runtime for existing systems

krumware commented 2 years ago

Thanks for the update!

josephtate commented 1 year ago

Can someone paste the cloud-init config that works for them? I'm having a hard time understanding how to switch ros consoles without getting logged out (which makes it a bust for cloud-init). I've also tried manually updating the /etc/ssl/certs/ca-certificates.txt file, but it doesn't seem to have had any effect on my systems when testing with both busybox/wget and docker pull, even after reboot.

What file is supposed to get updated with the new certificate on RancherOS?

Aaron-ML commented 1 year ago

I hate to be a pest, just bumping this to see if there is a definitive yes/no on a final release

we will not be doing a release to solve for an expired trust store entry. based on #3062 (comment), this can be solved for at provision-time for new systems and at runtime for existing systems

Are you saying in order to deploy rancheros with up to date ca-certificates that you'll need to bake that into provision time?

Running into this with rancheros 1.5.8 and letsencrypt certificates not being valid on the nodes, doesn't seem like there's a native way to update these without switching consoles or doing some provision time shenanigans.