pulp / pulpcore

Pulp 3 pulpcore package https://pypi.org/project/pulpcore/
GNU General Public License v2.0
284 stars 112 forks source link

status api leaks memory #4278

Open dkliban opened 1 year ago

dkliban commented 1 year ago

Version 3.29.7

Describe the bug An API worker's memory use grows over time. During this time only the status API is requested very frequently.

Over an 11 day period it went from 220 Mb to almost 1.9 Gb. On a different instance I noticed it go from 220 Mb to 410 Mb in 23 hours.

dralley commented 1 year ago

I tried commenting out everything in the serializer and view, and was still able to reproduce the memory leak. It was maybe about 30mb per 5000 requests or so (split across 2 workers) both with and without the serializer / view code.

It was the same when calling curl http://localhost:5001/pulp/api/v3/status/ 5000 times.

So I think the leak is really per-request and has nothing to do with the status API specifically. Except that the status API might be called the most frequently depending on how the installation is set up.

dkliban commented 1 year ago

Thank you @dralley for looking into this. The status API is called very frequently when Pulp is deployed on Kubernetes. It's used to monitor the health of the deployment all the time.

dralley commented 1 year ago

I tried:

To no avail

dralley commented 1 year ago

Independent discovery:

/pulp/api/v3/docs/api.json leaks memory like mad. Several mb per request and it keeps going if you hit it repeatedly.

dralley commented 1 year ago

I think there are a few independent ones

diff --git a/pulpcore/app/settings.py b/pulpcore/app/settings.py
index a8cca6c49..7e6296aed 100644
--- a/pulpcore/app/settings.py
+++ b/pulpcore/app/settings.py
@@ -112,7 +112,7 @@ MIDDLEWARE = [
     "django.contrib.auth.middleware.AuthenticationMiddleware",
     "django.contrib.messages.middleware.MessageMiddleware",
     "django.middleware.clickjacking.XFrameOptionsMiddleware",
-    "pulpcore.middleware.DomainMiddleware",
+    # "pulpcore.middleware.DomainMiddleware",
 ]

 AUTHENTICATION_BACKENDS = [
diff --git a/pulpcore/app/views/status.py b/pulpcore/app/views/status.py
index b11ae1c35..d5a74a102 100644
--- a/pulpcore/app/views/status.py
+++ b/pulpcore/app/views/status.py
@@ -69,15 +69,18 @@ class StatusView(APIView):
         else:
             redis_status = {"connected": False}

-        db_status = {"connected": self._get_db_conn_status()}
+        # db_status = {"connected": self._get_db_conn_status()}
+        db_status = True

         try:
-            online_workers = Worker.objects.online_workers()
+            # online_workers = Worker.objects.online_workers()
+            online_workers = None
         except Exception:
             online_workers = None

         try:
-            online_content_apps = ContentAppStatus.objects.online()
+            # online_content_apps = ContentAppStatus.objects.online()
+            online_content_apps = None
         except Exception:
             online_content_apps = None

Which doesn't really make a lot of sense? But it's what I see.

ggainey commented 1 year ago

Might be this? https://github.com/tfranzel/drf-spectacular/issues/597

Does api.yaml leak the same?

dralley commented 1 year ago

re: API, definitely. That seems plausible. In our case it's much more severe than in theirs, though. Hitting it only once will cause memory use to jump by 15mb repeatedly.

ipanova commented 1 year ago

@dralley we got this memory leak report that was decided to not circumvent directly in pulp https://github.com/pulp/pulpcore/issues/2005