tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.5k stars 1.93k forks source link

Init function for tf #8141

Closed borodadada closed 10 months ago

borodadada commented 10 months ago

Node JS version for CPU. Hi! i use loop for training, and first result ~90% always better then next iterations + after ~100+ iteration ( or more ) tf down I think it's possible fix if use init function for tf + to fix the memory leak I have to use

        //fix memory leak
        tf.engine().startScope()

        //fit
        await model.fit(data, item)

        //fix memory leak 
        tf.engine().endScope()

How can I get rid of all these problems? I need every iteration to be clean tf! This is very annoying! Thx!

gaikwadrahul8 commented 10 months ago

Hi, @borodadada

Thank you for bringing this issue to our attention and to enable us to fully grasp the problem and provide the most effective assistance, we kindly request your help with the following information

  1. Code Snippet or GitHub Repository: Please share a relevant code snippet that clearly demonstrates the issue you're encountering.

  2. Detailed Explanation:

    • The exact steps that lead to the problem.
    • Any error messages or unexpected behaviors you've observed.
    • The expected behavior of the code.
  3. System Information:

    • Node.js version you're using.
    • tjfs-node version.
    • Operating system platform (e.g., Windows, macOS, Linux).

By providing this comprehensive information, you'll enable us to replicate the issue accurately on our end.

Thank you in advance for your cooperation and patience.

borodadada commented 10 months ago

node js v19.9.0, intel 13700, 32gb, windows 11 x64 i use affinity windows for node, 4 cores on one copy

    "@tensorflow/tfjs": "^4.16.0",
    "@tensorflow/tfjs-node": "^4.16.0",

i created example

const tf = require('@tensorflow/tfjs-node');

const size = 10
const units = 100
let count = 0

const letsgo = async function(){

    const model = tf.sequential();
    model.add( tf.layers.dense({ inputShape: [units], units, activation: 'linear', useBias: true }));
    model.add( tf.layers.dense({ units, activation: 'linear', useBias: true }));
    model.add( tf.layers.dense({ units, activation: 'linear', useBias: true }));
    model.compile({ optimizer: tf.train.adam(0.005, 0.9, 0.999), loss: tf.losses.absoluteDifference });

    let a = []
    let b = []
    for (let i = 0; i < size; i++) {
        let aa = []
        let bb = []
        for (let ii = 0; ii < units; ii++) {
            aa.push( Math.random() )
            bb.push( Math.random() )
        }
        a.push(aa)
        b.push(bb)
    }

    let xs = tf.tensor2d( a );
    let ys = tf.tensor2d( b );

    await model.fit(xs, ys, {
        epochs: 100000,
        shuffle: false,
        verbose: 0,
        callbacks:{
            onTrainBegin: ()=>{
                console.log('start: ', count)
            },
            onTrainEnd: ()=>{
                console.log('done: ', count)
                count++
            },
            onEpochEnd: async (epoch, logs)=>{
                if( epoch % 100 === 0 )
                    console.log(epoch, logs.loss)
            }
        }
    })
}

const loop = async function(){
    for (let i = 0; i < 500; i++) {
        await letsgo()
        //now need init for tf, before new iteration
    }
}

loop()

after completing the first iteration, I need to bring the tf to its original state so that there are no memory leaks and no changes inside, what to do?

now after 6 or more hours, the program may give an error and close ~some problems with CPU0

gaikwadrahul8 commented 10 months ago

Hi, @borodadada

As per my current understanding you'll have to do something like below and please refer official documentation for tf.tidy : Using this method helps avoid memory leaks. In general, wrap calls to operations in tf.tidy() for automatic memory cleanup.

NOTE: Variables do not get cleaned up when inside a tidy(). If you want to dispose variables, please use tf.disposeVariables() or call tf.dispose directly on variables.

If I have missed something please let me know. Thank you for your cooperation and patience.

const tf = require('@tensorflow/tfjs-node');

const testDemo = async function() {
  const iterations = 5;

  for (let i = 0; i < iterations; i++) {
    // Start a new TensorFlow.js scope for each iteration
    tf.engine().startScope();

    // Execute operations within the tf.tidy() function to avoid memory leaks
    await tf.tidy(() => {
      const tensor1 = tf.tensor([1, 2, 3]);
      const tensor2 = tf.tensor([4, 5, 6]);

      // Perform computations using the tensors
      const result = tensor1.add(tensor2);

      // Print the result
      console.log(`Iteration ${i + 1} - Result:`, result.dataSync());

      // Dispose tensors when they are no longer needed
      tf.dispose([tensor1, tensor2, result]);
    });

    // End the TensorFlow.js scope to release any memory associated with it
    tf.engine().endScope();
  }
};

testDemo();
borodadada commented 10 months ago

model.fit need run in tf.tidy? I can not understand i added tf.disposeVariables, tf.engine().startScope(), tf.dispose, only tf.tidy no