ml5js / ml5-next-gen

Repo for next generation of ml5.js: friendly machine learning for the web! πŸ€–
https://ml5js.org/
Other
95 stars 23 forks source link

Memory leak in PoseNet's video processing pattern #129

Open davepagurek opened 7 months ago

davepagurek commented 7 months ago

Hi everyone! I realize there's a new version of the library being developed so this bug doesn't actually have to be addressed, but I thought maybe the learnings of the cause might be good to know for future development.

β†’ Step 1: Describe the issue πŸ“

So I was running PoseNet in a sketch being used in an art installation, intended to be running for days at a time. After a few hours on, it would mostly still run fast, but would drop a bunch of frames once in a while (at an accelerating pace, after being left on for e.g. 12 hours.) This shows up in a performance profile as a 1.2s(!!) garbage collection: image

I compared some heap profiles over time to see what got added, and noticed that some of the added memory includes a promise, referenced by a promise, referenced by a promise, referenced by a promise...:

image

β†’ Step 2: Screenshots or Relevant Documentation πŸ–Ό

I narrowed this down to the usage of promises in multiPose: https://github.com/ml5js/ml5-library/blob/f80f95aa6b31191ab7e79ff466e8506f5e48a172/src/PoseNet/index.js#L180-L182

β†’ Step 3: Share an example of the issue πŸ¦„

So, this code has the leak, using the automatic detection on the next frame, going through the promises above:

class MyDetector {
  constructor(constraints) {
    this.capture = createCapture(constraints)
    this.poseNet = ml5.poseNet(this.capture, {
       modelUrl: 'lib/model-stride16.json',
       graphModelURL: 'lib/group1-shard1of1.bin',
    })
    this.poseNet.on('pose', (results) => {
      this.processNextPose(results)
    })
  }

  processNextPose(results) {}
}

Meanwhile, this code does not seem to leak, by not initializing PoseNet with a video, and instead manually calling multiPose:

class MyDetector {
  constructor(constraints) {
    this.capture = createCapture(constraints)
    this.ready = false; // Manually store when we're ready to process a frame
    this.poseNet = ml5.poseNet(undefined, { // Pass undefined here
       modelUrl: 'lib/model-stride16.json',
       graphModelURL: 'lib/group1-shard1of1.bin',
    }, () => {
      this.ready = true
    })
    this.poseNet.on('pose', (results) => {
      this.processNextPose(results)
    })
  }

  // Call this every frame in draw():
  update() {
    if (this.ready) {
      this.ready = false;
      this.poseNet.multiPose(this.capture)
      requestAnimationFrame(() => {
        this.ready = true
      })
    }
  }

  processNextPose(results) {}
}

So it seems like the problem was in fact the buildup of contexts in the promises. It tries to return the promise from multiPose, effectively doing an asynchronous infinite loop on itself, and I guess the browser keeps around the whole chain of promises due to the fact that it's being returned instead of just awaited.

Anyway, I just wanted to flag this in case the same pattern is being used elsewhere!

β†’ Describe your setup πŸ¦„

shiffman commented 6 months ago

Hi @davepagurek thank you for reporting this and for your patience! I am working adding some documentation to this repo and archiving it. We have moved current ml5.js development to a new repository:

https://github.com/ml5js/ml5-next-gen

The latest examples are here:

https://editor.p5js.org/ml5/collections/pUzWMkdmE

We are planning to launch a new version of the library + website in June.

The new BodyPose functionality of ml5.js uses updated models behind the scenes (MoveNet and BlazePose). @ziyuan-linn have you noticed any signs of a memory leak like this in our newer examples?

davepagurek commented 6 months ago

Just skimming the code, it looks like the looping pattern here https://github.com/ml5js/ml5-next-gen/blob/main/src%2FBodyPose%2Findex.js#L312 would avoid the issue since it doesn't return the result of the promise, but I can try profiling the examples when I get back to my computer to be sure.

davepagurek commented 6 months ago

I was testing out this one https://editor.p5js.org/ml5/sketches/OukJYAJAb and taking heap snapshots every few minutes, and the snapshots seem to increase in size over time:

image

So that might mean there's a leak somewhere still? Here's the first and last snapshot for comparison: BEFORE-Heap-20240504T094341.heapsnapshot.zip AFTER-Heap-20240504T094350.heapsnapshot.zip

I see something kind of similar going on in the added Objects. They don't appear as promises this time, but the last one in the chain does say "pending."

image
shiffman commented 6 months ago

Thanks @davepagurek I moved this issue into the current repo for further discussion and tracking!

ziyuan-linn commented 6 months ago

@davepagurek Thank you for reporting this issue!

I did some testing and found memory leaks in a few models. Here are the heap snapshots:

bodyPose, with MoveNet as the underlying model:

bodyPose, with BlazePose as the underlying model on mediapipe runtime:

faceMesh, on tfjs runtime:

faceMesh, on mediapipe runtime:

This problem only seems to occur when using the mediapipe runtime. I think the problem could be due to MediaPipe using WASM and C++(no automatic garbage collection). Even this tfjs demo of face-landmark-detection on MediaPipe runtime have a memory leak. I could not find much information about this, but here is a potentially related issue with a solution.

For now, switching to tfjs runtime or MoveNet on the new version of ml5 should fix the memory leak:

bodyPose = ml5.bodyPose("BlazePose", { runtime: "tfjs" });

or

bodyPose = ml5.bodyPose("MoveNet");