lucasheld / uptime-kuma-api

A Python wrapper for the Uptime Kuma Socket.IO API
https://uptime-kuma-api.readthedocs.io
MIT License
272 stars 21 forks source link

Memory Leak #19

Closed Y0ngg4n closed 1 year ago

Y0ngg4n commented 1 year ago

It seems like this API has a memory leak.

MemoryLeak

I have used this python program to test it:

import threading
import time
from uptime_kuma_api import UptimeKumaApi

api = UptimeKumaApi("https://up.obco.pro")
print("Connected")
api.login("Yonggan", "PASSWORD")
print("Login")

def update():
    api.get_important_heartbeats()

def uptime_kuma_polling():
    while True:
        update()
        time.sleep(10)

if __name__ == '__main__':
    uptime_kuma_thread = threading.Thread(target=uptime_kuma_polling)
    uptime_kuma_thread.start()

The output was as expected:

mprof: Sampling memory every 0.1s
running new process
running as a Python program...
Connected
Login

So after the Login i am looping every 10 seconds to get the important heatbeats.

This is just a short example. I recognized it because my Horus flask app that is using this API used 1,5GB memory after some hours.

Y0ngg4n commented 1 year ago

MemoryLeak Here another graph

zimbres commented 1 year ago

Checking mine, 850M in use.

Y0ngg4n commented 1 year ago

@zimbres thats way too much.

lucasheld commented 1 year ago

Unfortunately I can not reproduce the problem. I have created a fresh Uptime Kuma 1.20.1 instance and added multiple monitors. Then I modified the monitors multiple times to fill the important heartbeats list. I used the latest uptime-kuma-api version (0.10.0).

docker run -it --rm -p 3001:3001 louislam/uptime-kuma:1.20.1
import threading
import time
from uptime_kuma_api import UptimeKumaApi

api = UptimeKumaApi("http://127.0.0.1:3001")
print("Connected")
api.login("admin", "secret123")
print("Login")

def update():
    api.get_important_heartbeats()

def uptime_kuma_polling():
    while True:
        update()
        time.sleep(10)

if __name__ == '__main__':
    uptime_kuma_thread = threading.Thread(target=uptime_kuma_polling)
    uptime_kuma_thread.start()

Figure_5

Can you please test with a fresh Uptime Kuma instance and provide a list of steps or include those steps in the python script on how to set up Uptime Kuma to reproduce the problem?

lucasheld commented 1 year ago

Maybe it's related to https://github.com/louislam/uptime-kuma/issues/2820.

I have tested this with Ping monitors. Do you use a MongoDB monitor? Then you can try to repeat your test with Uptime Kuma version 1.20.2.

Edit: Okay, that doesn't make sense. The memory leak we are talking about exists in uptime-kuma-api and not in uptime Kuma.

Y0ngg4n commented 1 year ago

@lucasheld so could you reproduce the issue? i was using 0.10.0 too

lucasheld commented 1 year ago

No, unfortunately not. Here is the current plot after even longer time. Figure_6

Y0ngg4n commented 1 year ago

@lucasheld Ok thats weird. Because in every app i used uptime-kuma-api i have this rising memory.

Y0ngg4n commented 1 year ago

@lucasheld maybe it is a specific type of monitor i am using causing this?

lucasheld commented 1 year ago

But you always used the same uptime Kuma instance right? Maybe it depends on the configuration. So we need a minimal reproducible example, including the configuration (monitors, notifications, status pages, ...).

Y0ngg4n commented 1 year ago

@lucasheld yes it was always the same instance. I can create a new one like you with the docker container to find out what is causing this

lucasheld commented 1 year ago

That would be great!

Y0ngg4n commented 1 year ago

@lucasheld Give me some time to investigate which monitor is causing this problems.

Y0ngg4n commented 1 year ago

@lucasheld So after my testing i have found out, that adding a monitor to uptime kuma results in usually taking up 0,1 MiB more in the API application.

Something weird was that it seems like if i pause or resume monitors the RAM usage increases by a constant value of 0,1 - 0,2 MiB? Idk I had it that sometimes the memory gone up quite a while and then stopped again. I could not say that one specific monitor is causing problems. It seemed sometimes like only having this issue if i enabled ntfy notifications. But maybe it was just a correlation.

TCP Hosts did not increased memory usage at all (only when disabling them again i got 0,1 MIB more)

Generally Memory just always was going up. Even if i deleted monitors.

After i had enabled 23 Monitors with Notifications i had the constant rising again. So i think it is constantly rising, but just so slow that we can not see it and it gets faster the more monitors you add. Currently i have only achieved it to create the rising if i enable ntfy notifications.

I don´t know how exactly you have done the implementation, but maybe there is something piling up? Some cache? maybe close the socket more often? I really don´t know. I have attached you the memgraph where i was enabling, adding, deleting, pausing monitors all time. i don´t think it will be helpfull to you because you don´t exactly know where i did what when, but you can see the memory is rising even if i delete all monitors.

Edit: It seems like adding monitors to status pages also increases by 0,6 MiB. (depends on how many monitors you add). And it seems like it is causing to make the memory usage in general grow faster.

sorry that i did not found out what monitor is causing this problems, but it is really hard to track what is causing this issues.

Figure_1

zimbres commented 1 year ago

I have 28 monitors um UK. Mostly Ping and one or two Http. From this API I'm calling only get_monitors each minute or so.

Since I'm running on Kubernetes, I set a memory limit for the Pod, so, its killed for OOM from time to time.

image

Y0ngg4n commented 1 year ago

@zimbres thats a good workaround for now! thank you for that!

Y0ngg4n commented 1 year ago

@lucasheld did the infornation i provided help you?

lucasheld commented 1 year ago

Sorry for the late feedback. Based on your information, I tried to reproduce the problem a few days ago by creating over 200 monitors with ansible-uptime-kuma. This has slightly increased the memory consumption, within an expected range, because the information about over 200 monitors is now also in memory. So I ran the system for a day with calling get_monitors every 5 seconds, but the memory consumption never increased. Then I edited and deleted some monitors, but the memory consumption didn't show any noticeable behavior there either. Currently I don't have time to deal with this in more detail. At the end of the month I will take a closer look and run the system for a longer time.

Y0ngg4n commented 1 year ago

@lucasheld maybe try different types of monitors too. So tcp and keyword monitors for example.

lucasheld commented 1 year ago

The monitors were ping and http as described by @zimbres. But yes, I will test all this, also in combination with notifications and other elements.

lucasheld commented 1 year ago

@Y0ngg4n @zimbres I think I have fixed the issue (bugfix/memory-leak). Some events were added to a list after they were received, without ever limiting the size of the list. I now handle the events in the same way as uptime kuma does (example).

It would be very helpful if you could test this. But note that I have changed the return values of the methods get_heartbeats, get_important_heartbeats, avg_ping, uptime, get_heartbeat and cert_info.

Uninstall the python module:

pip uninstall -y uptime-kuma-api

And install it again from the bugfix branch:

pip install git+https://github.com/lucasheld/uptime-kuma-api.git@bugfix/memory-leak
Y0ngg4n commented 1 year ago

@lucasheld wow 🥳 great! Thank you for your work! I will check it the next days if it fixed the issue and will give you a feedback. 😊

flikites commented 1 year ago

@lucasheld wow partying_face great! Thank you for your work! I will check it the next days if it fixed the issue and will give you a feedback. blush

Did you test this yet by chance?

Y0ngg4n commented 1 year ago

@lucasheld wow partying_face great! Thank you for your work! I will check it the next days if it fixed the issue and will give you a feedback. blush

Did you test this yet by chance?

No sorry did not had time. Maybe @zimbres can you test it?

zimbres commented 1 year ago

Im not able to test now, since my app relies on "active": true and this version is returning "active": 1, my app broken.

lucasheld commented 1 year ago

Im not able to test now, since my app relies on "active": true and this version is returning "active": 1, my app broken.

Thank you for reporting this. It is not supposed to be like this. I will fix it soon.

lucasheld commented 1 year ago

@zimbres The bug is fixed in the branch. It would be great if you could test the changes.

zimbres commented 1 year ago

Hi, just started.

Lets watch. I let you know.

zimbres commented 1 year ago

@lucasheld

So far, running smoothly

image

zimbres commented 1 year ago

More than 2 days without the OOK

image

lucasheld commented 1 year ago

Looks very good. Thank you very much for testing. Because of the breaking changes (changed return values of the methods get_heartbeats, get_important_heartbeats, avg_ping, uptime, get_heartbeat and cert_info) I have to release a new major version. I need a few days to check if there are any other useful breaking changes that should be included in this version.

Vinalti commented 1 year ago

Maybe you can make a branch for that version, and a PR from it, so we can review it, and maybe add other features to it, one by one.

lucasheld commented 1 year ago

It's included in release 1.0.1

tigunia-martin commented 1 year ago

It's included in release 1.0.1

Is there an easy way to tell the current installed version (docker)? I'm experiencing a major memory leak (in 3 weeks 6.6Gb of RAM) and wondering if somehow my docker file with image: medaziz11/uptimekuma_restapi pulled an old version.

flikites commented 1 year ago

@tigunia-martin What you can do is go inside the docker contianer and find the file(s) that were updated and see if those commits are included in the files in your contianer.

docker exec -it container-name sh

Then when you are in navigate to the files and use cat to display the outputs

tigunia-martin commented 1 year ago

Thanks @FliKites Looks like I have version 1.2.1. I'll open a new issue.

lucasheld commented 1 year ago

@tigunia-martin MedAziz11/Uptime-Kuma-Web-API version 1.2.1 uses uptime-kuma-api version 0.13.0. This version is affected by the memory leak.

tigunia-martin commented 1 year ago

@lucasheld I apologize profusely. I was following issues, linked PRs, and discussions and somehow moved from the web-api repo to the api repo without realizing.

Of course, I will bring this back to the correct github issues so that we can hopefully get the web api wrapper updated to use the newer API

Again, sorry, and thank you.