nolar / kopf

A Python framework to write Kubernetes operators in just a few lines of code
https://kopf.readthedocs.io/
MIT License
2.15k stars 163 forks source link

Delete during timer invocation ends up calling delete handlers twice #882

Open anorm opened 2 years ago

anorm commented 2 years ago

Long story short

In an operator with both a kopf.on.timer and a kopf.on.delete, if the custom resource is deleted while the timer is invoked, the delete handler is called twice.

I would expect the timer to be cancelled or that the current invocation was completed before the deletion handler is called.

Kopf version

1.35.3

Kubernetes version

1.22.4

Python version

Tested both on 3.9.9 and 3.10.1

Code

#!/usr/bin/env python

import asyncio
import kopf

@kopf.on.timer('kopfexamples', interval=1)
async def my_timer(logger, **kwargs):
  logger.info('timer start')
  await asyncio.sleep(5)
  logger.info('timer end')

@kopf.on.delete('kopfexamples')
async def my_delete(logger, **kwargs):
  logger.info('delete start')
  await asyncio.sleep(5)
  logger.info('delete end')

Logs

[2021-12-27 19:05:10,424] kopf.objects         [INFO    ] [test/test1] timer start
[2021-12-27 19:05:15,426] kopf.objects         [INFO    ] [test/test1] timer end
[2021-12-27 19:05:15,426] kopf.objects         [INFO    ] [test/test1] Timer 'my_timer' succeeded.
[2021-12-27 19:05:16,428] kopf.objects         [INFO    ] [test/test1] timer start
[2021-12-27 19:05:21,434] kopf.objects         [INFO    ] [test/test1] timer end
[2021-12-27 19:05:21,434] kopf.objects         [INFO    ] [test/test1] Timer 'my_timer' succeeded.
[2021-12-27 19:05:22,436] kopf.objects         [INFO    ] [test/test1] timer start
[2021-12-27 19:05:27,441] kopf.objects         [INFO    ] [test/test1] timer end
[2021-12-27 19:05:27,442] kopf.objects         [INFO    ] [test/test1] Timer 'my_timer' succeeded.
[2021-12-27 19:05:28,448] kopf.objects         [INFO    ] [test/test1] timer start
[2021-12-27 19:05:29,068] kopf.objects         [INFO    ] [test/test1] delete start
[2021-12-27 19:05:33,449] kopf.objects         [INFO    ] [test/test1] timer end
[2021-12-27 19:05:33,449] kopf.objects         [INFO    ] [test/test1] Timer 'my_timer' succeeded.
[2021-12-27 19:05:34,069] kopf.objects         [INFO    ] [test/test1] delete end
[2021-12-27 19:05:34,069] kopf.objects         [INFO    ] [test/test1] Handler 'my_delete' succeeded.
[2021-12-27 19:05:34,070] kopf.objects         [INFO    ] [test/test1] Deletion is processed: 1 succeeded; 0 failed.
[2021-12-27 19:06:34,214] kopf.objects         [INFO    ] [test/test1] delete start
[2021-12-27 19:06:39,218] kopf.objects         [INFO    ] [test/test1] delete end
[2021-12-27 19:06:39,220] kopf.objects         [INFO    ] [test/test1] Handler 'my_delete' succeeded.
[2021-12-27 19:06:39,220] kopf.objects         [INFO    ] [test/test1] Deletion is processed: 1 succeeded; 0 failed.

Additional information

The above was triggered using a single create and a single delete of a kubernetes custom resource. The kubectl delete ... did not exit before the second time the delete handler was invoked.

Thank you for your great work! I really appreciate it!

thismatters commented 2 years ago

Nolar gave me the advice that handlers should be fast acting: "[delete] handlers are supposed to be instant or at least fast (i.e. seconds)", or there could be unintended consequences. I'm assuming the same is true for timers, and that you're running into some of the "plethora of fancy side-effects and instability".

If you cannot shorten the operation of your timer and delete handler then you could make the delete handler return some value and check for that value (in status) at the start of the delete handler to prevent repetition.

anorm commented 2 years ago

I'm trying to make the handlers as fast as possible. But sooner or later, this problem will arise, no matter how fast it is. My worry isn't necessarily that the delete handler is run twice, but that it is being executed before the timer is finished. This might cause the underlying object to mutate while the timer handler is running.

Ideally, I think the framework should guarantee that:

This way, handlers for different objects can run at the same time, while the complexity of writing a correctly working operator is reduced since no special care is needed (from the operator developer's perspective) to make deterministic behavior.

So, if I understand this correctly, @nolar, doesn't consider this a bug but leaves it up to the user to handle the concurrency issues. If that's the case, then I guess this github issue should be considered as a feature request :-)