threedworld-mit / tdw

ThreeDWorld simulation environment
BSD 2-Clause "Simplified" License
506 stars 75 forks source link

Memory leak add/destroy/unload objects #577

Open Aubret opened 1 year ago

Aubret commented 1 year ago

Hi,

I'm using a setup where I add and remove a lot of Shapenet-Core objects. So I successively run add_object, destroy object and then I free the memory with the command unload_asset_bundles. unload_unused_assets is useless in my case for unknown reasons. unload_asset_bundles indeed reduce the used memory, but the memory keeps increasing as I add new objects until I run out of memory. Do I misunderstand something ?

import os
import time
from subprocess import Popen
import socket
from contextlib import closing

from tdw.add_ons.embodied_avatar import EmbodiedAvatar
from tdw.controller import Controller
from tdw.librarian import ModelLibrarian
from tdw.release.build import Build

def find_free_port():
    """
    Returns a free port as a string.
    """
    with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as s:
        s.bind(("", 0))
        return int(s.getsockname()[1])

port = find_free_port()
Popen([str(Build.BUILD_PATH.resolve()), "-port " + str(port), "-force-glcore42"])
c = Controller(check_version=True, port=port, launch_build=False)
command = []
command.append({"$type": "simulate_physics", "value": False})

c.add_ons.append(EmbodiedAvatar(avatar_id="a", position={"x": 0, "y": 0.5, "z": -1.5},
                                scale_factor={"x": 1, "y": 0, "z": 0.7}))
c.communicate(command)
lib = ModelLibrarian(library=os.environ["SHAPENET_LIBRARY_PATH"] + "/library.json")

i=0
for r in lib.records:
    command = []
    id = c.get_unique_id()
    command.append({"$type": "add_object",
                    "name": r.name,
                    "url": r.get_url(),
                    "scale_factor": 1,
                    "position": {"x": 0, "y": 1, "z": 0},
                    "rotation": {"x": 0, "y": r.canonical_rotation["y"], "z": 0},
                    "category": r.wcategory,
                    "id": id})
    resp = c.communicate(command)
    command = []
    command.append({"$type": "destroy_object", "id": id})
    command.append({"$type": "unload_asset_bundles"})
    # command.append({"$type": "unload_unused_assets"})
    i+=1
    if not i %100:
        print(i)
    c.communicate(command)
alters-mit commented 1 year ago

I've seen behavior like this before, but I haven't tested it in a long time.

There are three possibilities:

  1. Something async is happening during the GC call. Your if not i %100: line might not be sleeping long enough.
  2. There is a Unity bug, in which case there's nothing we can do about it.
  3. There is a TDW bug.

My guess has thus far been that 1. is correct.

This is a very difficult bug to resolve because it takes so long to reproduce it and the profiler in this case isn't very helpful. Since you have already made shapenet asset bundles, and I haven't, it would be very helpful if you could try the following:

  1. Try sleeping for a certain number of seconds rather than using a for loop and see if the problem goes away.
  2. Use something like psutil to track memory usage per model and post the result here as an attachment.
  3. It would also be helpful to know roughly how many models need to be loaded before there is an obvious leak.
Aubret commented 1 year ago
  1. I tried to play with time.sleep but it does not changes anything. In addition, in my original settup, I have intermediate long computations making the "natural sleep" longer.
  2. For now, I checked the memory of the python script with a profiler and I found nothing anormal, it seems to come from the bundle. I'm currently not sure how I can investigate the bundle.
  3. The leak starts "immediately" from what I've seen with the command "top". But the leak becomes large only when I load a lot of models. I also tried the same script with another library of 3D models that I have imported in TDW, and it turns out there is no leak. So it seems specific to ShapeNet or at least a feature of this library.
alters-mit commented 1 year ago

What is the other library?

Please send a zip of 1000 or so ShapeNet asset bundles. I could generate them myself but I want to be sure we're testing the same thing.

Aubret commented 1 year ago

It is the Toys4k library from this paper: Stojanov, S., Thai, A., & Rehg, J. M. (2021). Using shape to categorize: Low-shot learning with an explicit shape bias. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1798-1808).

Here are some of my bundles, I only work with Linux: https://drive.google.com/file/d/1d5C35gf-DQBdWfOLfRLnTr920q_W1hCG/view?usp=sharing