tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.47k stars 1.93k forks source link

Memory Leak (keep increasing even after disposing/tidying them) #6019

Closed ray-1337 closed 2 years ago

ray-1337 commented 2 years ago

System information

Describe the current behavior The memory usage is keep increasing while/after predicting the content. Tidying/disposing the tensor won't change anything.

Describe the expected behavior The memory usage should be either stay or decreased or any.

Standalone code to reproduce the issue

tf.engine().startScope();

let imageData = tf.node.decodeJpeg(new Uint8Array(image_Buffer), 3);
let model = await tf.loadLayersModel(`file://model/model.json`);

let model_checking = tf.tidy(() => {
  let normalized = tf.scalar(255);
  let img = imageData.toFloat().div(normalized);
  let RB = tf.image.resizeBilinear(img, [224, 224], true);
  let batched = RB.reshape([1, 224, 224, 3]);
  return model.predictOnBatch(batched);
});

const classes = ["Cat", "Dog"]; // leetcode
let values = model_checking.dataSync();

const topK = Math.min(classes.length, values.length);

for (let i = 0; i < values.length; i++) value_index.push({ value: values[i], index: i });

value_index.sort((a, b) => b.value - a.value);

const topk = new Float32Array(topK), topkI = new Int32Array(topK);

for (let i = 0; i < topK; i++) {
  topk[i] = value_index[i].value;
  topkI[i] = value_index[i].index;
};

console.log({ className: classes[topkI[0]], probability: topk[0] });

imageData.dispose();
model.dispose();
tf.dispose(model_checking);
tf.engine().disposeTensor(model_checking);
tf.engine().disposeVariables();
tf.engine().endScope();

Other info / logs

image

5 hours later: image

rthadur commented 2 years ago

@ray-1337 is it possible to share the model ?

ray-1337 commented 2 years ago

@ray-1337 is it possible to share the model ?

Yes. https://github.com/infinitered/nsfwjs

rthadur commented 2 years ago

Sorry , I did not find a model.json file , are you running any examples from above repo ?

ray-1337 commented 2 years ago

Sorry , I did not find a model.json file , are you running any examples from above repo ?

  1. https://github.com/infinitered/nsfwjs/tree/master/example/nsfw_demo/public/model
  2. Yes. It still remains the same.
grmatthews commented 2 years ago

I don't know if this helps, but I seemed to be experiencing the same thing (can't dispose of tensors), and seem to have fixed it by changing calls from myTensor.dispose() to tf.dispose(myTensor) and also changing myModel.dispose() to tf.dispose(myModel).

I am only using tf,tidy() in a few limited spots and that isn't leaking.

I monitored tf.memory().numTensors before and after and now have 0 leaks with using tf.dispose( thingToBeDisposed ).

ray-1337 commented 2 years ago

I don't know if this helps, but I seemed to be experiencing the same thing (can't dispose of tensors), and seem to have fixed it by changing calls from myTensor.dispose() to tf.dispose(myTensor) and also changing myModel.dispose() to tf.dispose(myModel).

I am only using tf,tidy() in a few limited spots and that isn't leaking.

I monitored tf.memory().numTensors before and after and now have 0 leaks with using tf.dispose( thingToBeDisposed ).

is that even clearing the NodeJS memory as well? EDIT: i dont think so

pyu10055 commented 2 years ago

@ray-1337 thank you for reporting the issue, I took a look at the code you provided, there are couple things caught my eyes:

  1. is the model loading happening on every inference?
  2. you are mixing engine scope and tidy, typically it is not recommended.

Here is an example code I tried, which only load the model once and the heapSize seems to be stable. Please give it a try and let me know if it works for you.

import * as tf from '@tensorflow/tfjs-node';
import {Tensor, Tensor3D} from '@tensorflow/tfjs-node';
import * as fs from 'fs';
let counter = 0;
let model: tf.LayersModel;
async function loadModel() {
  return await tf.loadLayersModel(
      `file://./nsfwjs/example/nsfw_demo/public/model/model.json`);
}
async function inference() {
  let imageData = tf.zeros([1, 299, 299, 3]);

  let model_checking = tf.tidy(() => {
    let normalized = tf.scalar(255);
    let img = imageData.toFloat().div(normalized) as Tensor3D;
    let RB = tf.image.resizeBilinear(img, [299, 299], true);
    let batched = RB.reshape([1, 299, 299, 3]);
    return model.predictOnBatch(batched) as Tensor;
  });

  const classes = ['Cat', 'Dog'];  // leetcode
  let values = model_checking.dataSync();

  // const topK = Math.min(classes.length, values.length);

  // for (let i = 0; i < values.length; i++) value_index.push({ value:
  // values[i], index: i });

  // value_index.sort((a, b) => b.value - a.value);

  // const topk = new Float32Array(topK), topkI = new Int32Array(topK);

  // for (let i = 0; i < topK; i++) {
  //   topk[i] = value_index[i].value;
  //   topkI[i] = value_index[i].index;
  // };

  // console.log({ className: classes[topkI[0]], probability: topk[0] });

  imageData.dispose();
  tf.dispose(model_checking);
  console.log(tf.memory());
  console.log(process.memoryUsage());
}

async function main() {
  model = await loadModel();    
  for (let i = 0; i < 1000; i++) {
    inference();
  }
  model.dispose();

  console.log(tf.memory());
}

main();
google-ml-butler[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you.

google-ml-butler[bot] commented 2 years ago

Closing as stale. Please @mention us if this needs more attention.

google-ml-butler[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No