ytsaurus / ytsaurus-k8s-operator

Kubernetes operator for YTsaurus.
https://ytsaurus.tech
Other
35 stars 24 forks source link

Replace EnableFullUpdate field with something better. #151

Open l0kix2 opened 8 months ago

l0kix2 commented 8 months ago

Currently we have EnableFullUpdate field in the ytsaurus main CRD and if it set to true operator consider it can recreate all the pods and fully update yt cluster. The idea here is full update would be controllable by human, but the problem is on deploy we often forget to change EnableFullUpdate=false back.

It would be better replaced with something that can be changed back by operator itself after full update is triggered and approved by human,

Ideas are appreciated.

l0kix2 commented 8 months ago

It is possible to store this flag in the cypress of updated yt, but not 100% sure it wouldn't bite us at some point when cluster is not available and we can't. But maybe if we implement flow carefully that would be good solution.

Maybe we can use ytsaurus resource in some other way: set label for example, or edit condition/status (is it possible in kubectl?). Maybe we can just let the operator to edit this field to false by itself?

l0kix2 commented 8 months ago

Maybe we can have some fuse-resource which is created on successfull cluster update by operator and is never deleted by operator. But human can delete that resource via kubectl which will lead to full update.

sgburtsev commented 4 months ago

I could suggest two approaches for this or a combination of them. The first is to use API aggregation. This allows to implement custom operations for specific tasks:

kubectl yt-upgrade -n yt ghcr.io/ytsaurus/ytsaurus

The second approach is to define separate CRDs for different tasks. For example:

apiVersion: cluster.ytsaurus.tech/v1
kind: YtsaurusVersionUpgrade
metadata:
  name: yt
spec:
  coreImage: ghcr.io/ytsaurus/ytsaurus

Both approaches require making Ytsaurus CRD read-only. The manifest should only be changed using action CRDs or API calls. From my point of view, the second approach is preferable due to its simplicity. Distributing cluster settings across different CRDs is less error prone for users. For instance, only YtsaurusClusterCreate would have cellTag field as it cannot be change later. Another advantage is a predetermined procedure for each CRD: the operator would know that only a version upgrade should trigger FullUpdate, using RollingUpdate for all other CRDs. The current use of a single CRD is too difficult to maintain. It is often ambiguous which action operator should execute when changing several fields in the manifest.