open-telemetry / opentelemetry-python

OpenTelemetry Python API and SDK
https://opentelemetry.io
Apache License 2.0
1.69k stars 591 forks source link

Virtual memory usage is high when imported with threads #3512

Open dshivashankar1994 opened 9 months ago

dshivashankar1994 commented 9 months ago

While importing from opentelemetry import metrics, consumes some memory ~136 KB (virt) and 0 Bytes (rss) But with threads, the virtual memory usage is abnormally high - 4696MB virt mem.

My question is, why does the virtual memory shootup in this case and how can I manage it ?

In [1]: def imp():
   ...:     from opentelemetry import metrics
   ...: mem = psutil.Process().memory_info()
   ...: imp()
   ...: print(ByteCount(psutil.Process().memory_info().vms - mem.vms), ByteCount(psutil.Process().memory_info().rss - mem.rss))
136 KiB 0 B

If I try to do the same with threads, the virtual memory usage shoots up abnormally

In [1]: def imp():
   ...:     from opentelemetry import metrics
   ...: 
   ...: mem = psutil.Process().memory_info()
   ...: threads = [threading.Thread(target=imp) for _ in range(200)]
   ...: for thread in threads: thread.start()
   ...: for thread in threads: thread.join()
   ...: print(ByteCount(psutil.Process().memory_info().vms - mem.vms), ByteCount(psutil.Process().memory_info().rss - mem.rss))
4696 MiB 652 KiB

I added threading.lock to prevent any race condition and make it thread sage. But the issue was still there

In [1]: lock = threading.Lock()
   ...: def imp():
   ...:     with lock:
   ...:         from opentelemetry import metrics
   ...: 
   ...: mem = psutil.Process().memory_info()
   ...: threads = [threading.Thread(target=imp) for _ in range(200)]
   ...: for thread in threads: thread.start()
   ...: for thread in threads: thread.join()
   ...: print(ByteCount(psutil.Process().memory_info().vms - mem.vms), ByteCount(psutil.Process().memory_info().rss - mem.rss))
2456 MiB 992 KiB

I then added a condition to not import if it is already imported but still faced the same proble

In [1]: lock = threading.Lock()
   ...: def imp():
   ...:     with lock:
   ...:         if "opentelemetry" not in sys.modules:
   ...:            from opentelemetry import metrics
   ...:            print("Imported")
   ...: 
   ...: mem = psutil.Process().memory_info()
   ...: threads = [threading.Thread(target=imp) for _ in range(200)]
   ...: for thread in threads: thread.start()
   ...: for thread in threads: thread.join()
   ...: print(ByteCount(psutil.Process().memory_info().vms - mem.vms), ByteCount(psutil.Process().memory_info().rss - mem.rss))
Imported
216 MiB 816 KiB

What is the expected behavior? The expected virtual memory and rss increase shouldn't be as high when executed with threads.

What is the actual behavior? The virtual memory usage is very high

Same is the case with from opentelemetry.sdk.metrics import MeterProvider too

kennykguo commented 9 months ago

Please assign the bug to me, thanks