Open pinkdawn opened 8 months ago
Do you think you can suggest patch that can fix it (if possible create pull request in upstream) ? I do not understand that part of code. But if there is really this problem, I am interested in putting patch into packages for RHEL clones.
Hello @pinkdawn
Thank you for pointing out this, I confirm there's a bunch of users raising intermittent memory leak issues around the dlrd process, but to date, there's no established diagnotic.
From the code snippet you pinned from managers/dlr.py I can see no issue, here's my hypothesis:
On step 3, we are sure there's no orphan timer to fire at nothing.
Please explain your angle.
please check my pull request. i suspect the same problem here. someone need to do some testing for the dlr manager. peace 👌
confirmed, it's the same issue. But i can not solve it the same way. in redis we have to account for multiple payloads of Dlr level 1 and Dlr level 2. they share the same id. I don't have anymore time to give this. i hope someone can continue solving it.
In dly.py:
For 3 or 4 years, this problem keeps bother me, that the dlr service slowly memory leak.
Recently I do a memory dump and found that, the memory are full of delivery_sm messages that got:
[msgid:%s] (retrials: %s/%s) DLRMapNotFound: %s
Got a DLR for an unknown message id: %s (coded:%s)
And finally it leads me to the above code, it's obvious that, the
timer
got created, but if it's already in queue, it will be abandoned and assign to the old one !!!!!!! And the new create one will execute but waits for nothing forever.