os-climate / os_c_data_commons

Repository for Data Commons platform architecture overview, as well as developer and user documentation
Apache License 2.0
19 stars 10 forks source link

SSL certificate for ODH has expired #138

Open MichaelTiemannOSC opened 2 years ago

MichaelTiemannOSC commented 2 years ago

das-odh-trino.apps.odh-cl1.apps.os-climate.org has an SSL certificate that was valid for 3 months. It expired last week. Please update so I can get fresh credentials.

Can we automate the checking of things like this so we don't have interruptions every 3 months?

@erikerlandson @caldeirav

HumairAK commented 2 years ago

Updating, das will go down for a few

HumairAK commented 2 years ago

should be good now, as for automating this, we provision these certs on our routes using acme operator I thought this was already automated so I'm a bit surprised. Will take a look at what we can do.

MichaelTiemannOSC commented 2 years ago

All good, thanks!

MichaelTiemannOSC commented 2 years ago

Re-opening because @mersin35 has just encountered yet another SSL expiration. Please let's do automate this (on all clusters that ODH users have access to).

@rynofinn @MightyNerdEric @grigarr

HeatherAck commented 2 years ago

I created a ticket with LF Team: https://jira.linuxfoundation.org/plugins/servlet/theme/portal/2/IT-24355

HumairAK commented 2 years ago

More details on how certs are being handled for services.

We use the acme-operator as mentioned before to manage certs for ocp routes.

You can find the operator running here.

It claims:

It will automatically provision certificates using ACME v2 protocol and manage their lifecycle including automatic renewals.

But clearly that is not what's happening here, so first task would be to investigate why.

I've also had success using edge routes (example) which used the routers default certs when no tls was specified. This means that tls termination happens at the ocp router and the cluster ingress certs are used (if no other tls certs are added to the route). So we would need to just make sure cluster certs are auto renewed (which have to do anyway), ideally we would use cert-manager, but the operator has issues with aws provisioned clusters, which require us to manually deploy cert-manager.