Open dpdoughe opened 8 months ago
This seems to be some rare scenario in which the number of decrementations doesn't equal the number of incrementations. This leads to a gradual net creeping up of the number_of_prefetched_tasks
during the up-time of the server
My current best guess about what is going on here is that this upwards drift can happen when an expires
is set on a task. According to the docs ,
When a worker receives an expired task it will mark the task as REVOKED
Since the decrementation logic https://github.com/mher/flower/blob/58136bf26bb641082d31a376306b73ef5047b80f/flower/events.py#L92 apparently makes no allowance for revoked tasks, the counter drifts upwards in proportion to the number of revoked tasks.
So to conclusively test the theory, I created a task that had expires=1
and the task internally slept for 100 seconds.
Sending a bunch of async requests like this over a short period of time does result in the buggy behavior:
We can see the difference between a task that was prefetched and then revoked
compared to a task that was revoked before ever becoming prefetched
So the new code in my pull request looks for the attribute values needed to distinguish and then decrement those that only are revoked after becoming prefetched.
@mher Please take a look at this issue.
Describe the bug The number_of_prefetched_tasks remains greater than zero and grows slowly after service is under load.
To Reproduce I've seen the number_of_prefetched_tasks slowly creep up (by 1 or 2) counts after periods of heavy load (hundreds of thousands of tasks). I don't have a way in hand to reproduce this as I don't know yet what the root cause might be.
What I think may be going on is that the code here https://github.com/mher/flower/blob/58136bf26bb641082d31a376306b73ef5047b80f/flower/events.py#L89 will increment the counter if 3 conditions are met.
While the code here https://github.com/mher/flower/blob/58136bf26bb641082d31a376306b73ef5047b80f/flower/events.py#L92 is required to decrement the counter.
I wonder if either i) some condition might fail to match the 4 decrementing criteria, or if there is some scenario where the corresponding event would never get fired at all.
Expected behavior number_of_prefetched_tasks should decrease as well as increase and potentially get back to zero under periods of low to no load.
Screenshots
System information Output of
python -c 'from flower.utils import bugreport; print(bugreport())'
commandflower -> flower:2.0.1 tornado:6.4 humanize:4.9.0 software -> celery:5.3.6 (emerald-rush) kombu:5.3.4 py:3.8.10 billiard:4.2.0 py-amqp:5.2.0 platform -> system:Linux arch:64bit kernel version:5.15.0-1053-azure imp:CPython loader -> celery.loaders.app.AppLoader settings -> transport:amqp results:disabled
deprecated_settings: None