litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.43k stars 694 forks source link

Ensure seamless and non-disruptive minor version upgrades #4486

Open smitthakkar96 opened 8 months ago

smitthakkar96 commented 8 months ago

Context

While upgrading from v3.0.0 to v3.1.0, we encountered an error that resulted in our Chaos Infra becoming inactive until the Infra version was upgraded. In environments with numerous clusters and namespaces, teams may opt for a phased rollout of Infra components across clusters, akin to a canary deployment. However, this approach is currently not feasible due to the validation of the infra version by the infraConnect endpoint, which relies solely on the VERSION specified in the litmus-portal-admin-config.

Error

time="2024-03-05T06:36:37Z" level=error msg="Error response from the server : {\"payload\":{\"errors\":[{\"message\":\"ERROR: infra VERSION MISMATCH (need 3.1.x got 3.0.0)\",\"path\":[\"infraConnect\"]}],\"data\":null},\"type\":\"data\"}\n"

Proposal

Use INFRA_COMPATIBLE_VERSIONS env var, which contains a list of compatible versions over using VERSION in infraConnect endpoint to validate if infra version is compatible.

smitthakkar96 commented 8 months ago

@Saranya-jena @namkyu1999 @vanshBhatia-A4k9 @SarthakJain26 wdyt?

SarthakJain26 commented 8 months ago

We should not mandate the upgrade of infra for every chaos-center upgrade. It should only be mandated if the infra is not compatible with the installed chaos-center.