kr8s-org / kr8s

A batteries-included Python client library for Kubernetes that feels familiar for folks who already know how to use kubectl
https://kr8s.org
BSD 3-Clause "New" or "Revised" License
796 stars 41 forks source link

Atomic operations #451

Open teocns opened 1 month ago

teocns commented 1 month ago

Which project are you requesting an enhancement for?

kr8s

What do you need?

Plenty of years ago I had developed a Django backend system providing amazing sync relationsheep between a model and the resource living in the cluster. The killer feature is atomic operations. Normally, when performing multiple operations (i.e create, update, etc), if one fails, you're left with an inconsistent cluster state. My approach wasn't the best though: a hybrid (hard-coded/computed) mapping of "rollback" delegates did the trick.

I am wondering if such solution is outdated nowadays, the opposite, or does Kubernetes provide such feature out of the box?

If I see interest I will gladly collaborate to integrate a non-ORM variant into Kr8s

Some preview of how it looks like

class K8SAtomicOperationsContext(transaction.Atomic):
    """
    An Atomic context manager that rolls back operations

    Has context upon created KubernetesResource's versions
    When failed, rolls back to the previous version
    """

    signals: K8SAtomicOperationSignals
    manager: K8SAtomicOperationsManager
    def __init__(self, *args,**kwargs):
        self.created_entities = []
        self.manager = K8SAtomicOperationsManager()
        self.signals = K8SAtomicOperationSignals(self.manager)
        super().__init__(
            DEFAULT_DB_ALIAS,
            True,
            False
        )

    def __enter__(self):
        # Call the superclass method to enter the transaction
        super().__enter__()
        # Set up a signal to track when entities are created
        self.signals.connect()

    def __exit__(self, exc_type, exc_value, traceback):
        # Deregister the signal
        self.signals.disconnect()

        if exc_type is not None:
            self.manager.rollback(exc_value)
        # Call the superclass method to exit the transaction
        return super().__exit__(exc_type, exc_value, traceback)

class K8SAtomicOperationsManager:

    rollback_operations: List['ResourceOperation']

    def __init__(self) -> None:
        self.rollback_operations = []

    def on_rollback(self, operation):
        """
        Rollback the operation
        """
        self.rollback_operations.append(operation)

    def rollback(self, exception):
        """
        Rollback all operations
        """
        # Filter rollback operations that should not be rolled back
        ops = list(filter(lambda op: self.should_rollback(op,exception), self.rollback_operations))
        log.info("Rolling back %s operations" % len(ops))
        for operation in reversed(ops):
            if not self.should_rollback(operation, exception):
                continue
            retries_left = 10
            while retries_left > 0:    
                try:
                    operation.execute()
                    break
                except APIException as e:
                    if e.status == 404:
                        raise e
                    log.info('Rollback %s fail: API Error %s:%s. Retrying %s more times' % (
                        operation,
                        e.status, e.reason, retries_left
                    ))
                    time.sleep(2)
                    retries_left -= 1
                    if retries_left == 0:
                        raise e

    def should_rollback(self,rollback_operation: 'ResourceOperation', exception):
        """
        Returns True if the manager should rollback an operation
        Avoids rolling back operations that were never executed
        """
        if isinstance(exception, APIException):
            if exception.operation == rollback_operation.rollsback:
                return False
        return True
jacobtomlinson commented 1 month ago

Kubernetes has some support for rollbacks built in, but AFAIK this is only for Deployment resources.

I'm curious to know more about your specific use case and when you think this would be a useful feature?