metadriverse / metadrive

MetaDrive: Open-source driving simulator
https://metadriverse.github.io/metadrive/
Apache License 2.0
746 stars 107 forks source link

Memory leak when repeatedly creating/closing environments #634

Open fredyshox opened 7 months ago

fredyshox commented 7 months ago

MetaDriveEnv seem to leak some memory, when it's repeatedly created and closed.

The memory usage is growing with each instantiation of the environment, and is not released after closing/deletion, which makes it difficult to run multiple rollouts with different setups within single process.

Example:

from metadrive.envs.metadrive_env import MetaDriveEnv
from metadrive.component.sensors.rgb_camera import RGBCamera
import psutil
import logging

for i in range(10):
  ## offscreen
  # sensors = {"rgb_camera": (RGBCamera, 1920, 1080)}
  # env = MetaDriveEnv({"image_observation": True, "use_render": False, "force_destroy": True, "preload_models": False, "sensors": sensors, "log_level": logging.CRITICAL})
  ## onscreen
  env = MetaDriveEnv({"use_render": True, "log_level": logging.CRITICAL})
  obs, _ = env.reset()
  env.close()

  del env
  env = None

  mem = psutil.Process().memory_info().rss
  print(f"Memory usage ({i}): ", mem / (1024 ** 2), "MB")

It happens both in onscreen/offscreen modes. Despite calling close and explicitely deleting the environment, each of them is leaking approx 2 GB of memory:

Memory usage (0):  3120.64453125 MB
Memory usage (1):  5238.05859375 MB
Memory usage (2):  7257.15625 MB
Memory usage (3):  9276.53125 MB
Memory usage (4):  11246.49609375 MB
Memory usage (5):  13243.39453125 MB
Memory usage (6):  15212.72265625 MB
Memory usage (7):  17178.1640625 MB
Memory usage (8):  19159.47265625 MB
Memory usage (9):  21152.8125 MB

Its possible that i'm doing something wrong here, is it the correct way to release the environment?

After short investigation using tracemalloc the majority of leaked memory does not seem to be allocated by python code (panda3d?).

QuanyiLi commented 7 months ago

I tried to investigate this and get some basic conclusions now.

Firstly, if the onscreen/offscreen rendering is turned off, this won't happen. Only about 1 MB of memory leaks per close-reset. I then turned on a magic key called debug_physics_world which bypasses all asset loading but still opens up a window. The leakage remains about 1MB per close-reset, which suggests it has nothing to do with Panda's rendering service. It must be that some assets fail to be destroyed. Finally, by ablating rendering for different objects, I find that the terrain is not destroyed and remains in memory even if the close is called. It indeed alleviates the problem a lot but still causes about 50MB leakage per close-reset...

My branch for fixing this is at: https://github.com/metadriverse/metadrive/tree/fix-memeory-leak

The test script is at: https://github.com/metadriverse/metadrive/blob/fix-memeory-leak/metadrive/tests/test_sensors/test_close_reset_for_3d_render.py

pengzhenghao commented 7 months ago
Memory usage (0):  2962.91796875 MB
Memory usage (1):  3140.48046875 MB
Memory usage (2):  3226.16796875 MB
Memory usage (3):  3302.96875 MB
Memory usage (4):  3380.62109375 MB
Memory usage (5):  3439.1796875 MB
Memory usage (6):  3481.4140625 MB
Memory usage (7):  3532.0390625 MB
Memory usage (8):  3574.7578125 MB
Memory usage (9):  3625.41796875 MB

Process finished with exit code 0

My result is above in latest branch. Seems like this issue is greatly alleviated.

QuanyiLi commented 7 months ago

Some thing related to rendering is still left in the memory