tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.36k stars 1.92k forks source link

Different Model Outputs on Android vs. iOS with tfjs #8342

Open ertan95 opened 1 month ago

ertan95 commented 1 month ago

I am experiencing a significant discrepancy in the logits and probabilities output when running the same TensorFlow.js model on Android and iOS. The model, a self-trained MobileNetV3 (large) from PyTorch, performs as expected on iOS and in a PyTorch Jupyter Notebook but produces different results on Android. I convert the model from PyTorch to ONNX and then to TensorFlow.js.

To troubleshoot, I saved the preprocessed tensor from Android and used it on iOS, where it worked correctly. Conversely, the iOS tensor failed on Android. This rules out preprocessing issues, suggesting either improper weight handling on Android or an issue with the predict function.

System information

Snippet from my package.json:

"@tensorflow/tfjs": "^4.15.0",
"@tensorflow/tfjs-core": "^4.15.0",
"@tensorflow/tfjs-react-native": "^1.0.0",

Describe the current behavior The model outputs consistent and expected results on iOS and in the Jupyter Notebook but produces different and incorrect results on Android.

Describe the expected behavior The model should produce consistent logits and probabilities across all platforms, including Android, as it does on iOS and in the PyTorch Jupyter Notebook.

Standalone code to reproduce the issue

const savePredictions = async (logits, probabilities, fileName, variants, processedImage) => {
  try {
    const logitsData = await logits.array();
    const probabilitiesData = await probabilities.array();
    const processedImageData = await processedImage.array();

    const predictionsJSON = {
      variants: variants,
      processedImage: processedImageData,
      logits: logitsData,
      probabilities: probabilitiesData,
    };
    const tensorJSON = JSON.stringify(predictionsJSON);
    await FileSystem.writeAsStringAsync(fileName, tensorJSON);
    console.log('Predictions saved:', fileName, 'in', FileSystem.documentDirectory);
  } catch (error) {
    console.error('Error:', error);
  }
};

const processImage = async (uri: string): Promise<tf.Tensor> => {
  try {
    // rescale picture to model trained picture size
    const resizedImg = await manipulateAsync(
      uri,
      [{ resize: { width: trainingSizes.img_width, height: trainingSizes.img_height } }],
      { compress: 0.6, format: SaveFormat.JPEG, base64: true }
    );

    const imageTensor = tf.tidy(() => {
      const rescaledBase64 = `data:image/jpeg;base64,${resizedImg.base64}`.split(',')[1];
      const uint8array = tf.util.encodeString(rescaledBase64, 'base64').buffer;
      let tensor = decodeJpeg(new Uint8Array(uint8array));
      tensor = tf.image.resizeBilinear(tensor, [
        trainingSizesEN.img_height,
        trainingSizesEN.img_width,
      ]);
      tensor = tensor.div(255.0);
      tensor = tensor
        .sub(tf.tensor1d([0.485, 0.456, 0.406]))
        .div(tf.tensor1d([0.229, 0.224, 0.225]));
      tensor = tensor.transpose([2, 0, 1]).expandDims(0);
      return tensor;
    });

    //console.log('processImage memory:', tf.memory());
    return imageTensor;
  } catch (error) {
    console.error('Error on preprocessing image:', error);
    throw error;
  }
};

const predictImage = async (
  model: tf.GraphModel | null,
  processedImage: tf.Tensor,
  variants: string[],
): Promise<string[]> => {
  try {
    if (!model) {
      throw new Error('Modell not loaded');
    }
    //Overwrite processedImage with test data
    /*
    const testTensorData: number[][][][] = predictionJSON_Android[
      'processedImage'
    ] as number[][][][];
    const testTensor = tf.tensor4d(testTensorData);
    processedImage = testTensor;
    */

    // Mask non relevant classes
    const maskArray = Object.values(classLabels).map((label) =>
      variants.includes(label) ? 1 : 0
    );
    const maskTensor = tf.tensor(maskArray, [1, maskArray.length]);
    const modelInput = { input: processedImage, mask: maskTensor };
    const tidyResult = tf.tidy(() => {
      const logits = model.predict(modelInput) as tf.Tensor;
      const probabilities = tf.softmax(logits);
      return { logits, probabilities };
    });

    await savePredictions(
      tidyResult.logits,
      tidyResult.probabilities,
      FileSystem.documentDirectory + 'prediction.json',
      variants,
      processedImage
    );

    tidyResult.logits.dispose();
    maskTensor.dispose();
    tf.dispose(processedImage);

    const predictionArrayBuffer = await tidyResult.probabilities.data();
    tidyResult.probabilities.dispose();

    const predictionArray = Array.from(predictionArrayBuffer);
    const classLabelsArray = Object.values(classLabels);

    const variantPredictions = predictionArray
      .map((probability, index) => ({ label: classLabelsArray[index], probability }))
      .filter((prediction) => cardVariants.includes(prediction.label))
      .sort((a, b) => b.probability - a.probability);

    variantPredictions.forEach((variant) => {
      console.log(`Probillity for ${variant.label}: ${variant.probability}`);
    });

    const sortedLabels = variantPredictions.map((prediction) => prediction.label);
    return sortedLabels;
  } catch (error) {
    console.error('Error on prediction:', error);
    throw error;
  }
};

....
//Loading model
const loadModel = async () => {
  try {
    const ioHandler = bundleResourceIO(modelJson as tf.io.ModelJSON, [
      modelWeights1,
      modelWeights2,
      modelWeights3,
      modelWeights4,
    ]);
    const model= await tf.loadGraphModel(ioHandler);
    return model;
  } catch (error) {
    console.error('Error on loading model:', error);
    return null;
  }
};

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. prediction_ios.json prediction_ios_with_android_tensor.json prediction_android.json prediction_android_with_ios_tensor.json

oleksandr-ravin commented 1 month ago

I have same problem on all devices based on Samsung Exynos chipset platforms and i reproduce same problem on POCO c40, on snapdragon or ios device it's work as expected. Project run on angular with converted models on tfjs 4.17.0...

try { this.model = await tfconv.loadGraphModel(path, { requestInit, onProgress: (value) => { console.log('[MODEL] loading: ' + getModelNameFromPath(path) + ': ' + (value * 100) + '%'); this.updateLoadingProgress(value); } }); } catch (e) { console.error('[MODEL] error loading model', e); }

const res = await modelLoad.executeAsync({[config.inputsName]: inputTensor}, config.outputs);

oleksandr-ravin commented 1 month ago

I downgrade libs to 3.3.0 version and it works! on version 3.11.0 still problem. Version between 3.3.0 and 3.11.0 i didn't check

gaikwadrahul8 commented 1 month ago

Hi, @ertan95, @oleksandr-ravin

I apologize for the delayed response and thank you for bringing this issue to our attention with valuable analysis and insights, if possible could you please help us with your Github repo along with comprehensive steps to reproduce the same behavior from our end to investigate this behavior further ?

Thank you for your cooperation and patience.

oleksandr-ravin commented 1 month ago

Hi @gaikwadrahul8 for my models "outputs": [ "StatefulPartitionedCall/model_1/zoomin_type/Softmax", "StatefulPartitionedCall/model_1/sectors_quality/Sigmoid", "StatefulPartitionedCall/model_1/body_type/Softmax", "StatefulPartitionedCall/model_1/out_of_distribution/Sigmoid", "StatefulPartitionedCall/model_1/spheric_sectors_onehot_encoded/Softmax" ],

i finished test all version and 3.3.0 it's latest version witch give me correct values, need find wath change frome 3.3.0 to 3.4.0. if i understand correct it's problem on hardware translations and calculations other phone didn't have this truble.

ertan95 commented 3 weeks ago

Hi @gaikwadrahul8 for my models "outputs": [ "StatefulPartitionedCall/model_1/zoomin_type/Softmax", "StatefulPartitionedCall/model_1/sectors_quality/Sigmoid", "StatefulPartitionedCall/model_1/body_type/Softmax", "StatefulPartitionedCall/model_1/out_of_distribution/Sigmoid", "StatefulPartitionedCall/model_1/spheric_sectors_onehot_encoded/Softmax" ],

i finished test all version and 3.3.0 it's latest version witch give me correct values, need find wath change frome 3.3.0 to 3.4.0. if i understand correct it's problem on hardware translations and calculations other phone didn't have this truble.

I will have a look at this one. A downgrade is not the best option for me due to other dependencies, but I will give it a try thanks for the solution!

ertan95 commented 3 weeks ago

Hi, @ertan95, @oleksandr-ravin

I apologize for the delayed response and thank you for bringing this issue to our attention with valuable analysis and insights, if possible could you please help us with your Github repo along with comprehensive steps to reproduce the same behavior from our end to investigate this behavior further ?

Thank you for your cooperation and patience.

Well I've described the steps to reproduce in my initial post. Basically train a MobileNetV3 and test it on ios and android with the same image you will get different outputs.