packit / deployment

Ansible playbooks and scripts for deploying packit-service to OpenShift
MIT License
8 stars 25 forks source link

chore: use k8s_apply=true by default #601

Closed mfocko closed 1 month ago

mfocko commented 1 month ago

• Summary

Initially I wanted to do ‹k8s_apply=false› for postgres, and key-value databases (such as Redis, Redict, or Valkey), because deploying on prod with ‹k8s_apply=true› caused redeployment of the postgres which caused a small outage (~5 minutes).

Right now when I tried to redeploy stage multiple times in a row, none of the deployed services got redeployed, hence I come to the conclusion that there were some changes on the production deployment that were applied back then.

• Context from https://github.com/packit/deployment/issues/360

· What it actually does?

In the simple terms, it makes sure that the definition that is to be
deployed matches the one that is already deployed. The difference has
already manifested few times, e.g., when @majamassarini was adjusting
the `/dev/shm` for the postgres deployment (https://github.com/mfocko/deployment/commit/bedef2026c84ea00bb329799cc9bef81687fe88d), the change did
not get deployed.

· Why some tasks already use it (e.g., Redis/Redict, Flower secret) and others not?

not sure

· Would it make sense to default to ‹apply=true›?

Yes, but at the same time, applying meaningless changes to critical
services, e.g., postgres or Redis/Redict/Valkey, can cause smaller
outages.

Fixes https://github.com/packit/deployment/issues/360

softwarefactory-project-zuul[bot] commented 1 month ago

Build succeeded. https://softwarefactory-project.io/zuul/t/packit-service/buildset/55b03cd9b3f84b4aa3a9763de7f543e5

:heavy_check_mark: pre-commit SUCCESS in 2m 01s

softwarefactory-project-zuul[bot] commented 1 month ago

Build succeeded. https://softwarefactory-project.io/zuul/t/packit-service/buildset/05d0b9bb16dc4ba8a376b9076ade29c1

:heavy_check_mark: pre-commit SUCCESS in 1m 40s

mfocko commented 1 month ago

lgtm, we can revisit this if it starts causing bigger outages on redeployments

ofc we can, affects manual redeployments though, so not sure who remembers about it by the time we need to touch it manually :D even the TLS certs from Monday were without redeployment (I scaled stage, and Franta moved production, rebuilt images loaded new certs on automatic redeployment)