tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.36k stars 1.92k forks source link

Some bugs related to tensor data lost during execution in the browser #8338

Open DeepLearningMOSA opened 1 month ago

DeepLearningMOSA commented 1 month ago

Some tensor data are lost during execution. System information Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): yes OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 TensorFlow.js installed from (npm or script link): script link TensorFlow.js version (use command below): 3.19.0 Browser version: Microsoft Edge 125.0.2535.67(64 bit)、Google Chrome 125.0.6422.112 (64 bit) Tensorflow.js Converter Version: - Exception Report { "model_inf": "from:0 to:1 operator:identity from:1 to:2 operator:identity from:1 to:8 operator:max_pooling2D from:2 to:3 operator:identity from:3 to:4 operator:identity from:4 to:5 operator:identity from:5 to:6 operator:identity from:6 to:7 operator:identity from:7 to:8 operator:identity from:8 to:9 operator:reducemean ", "error_message": "Error: Based on the provided shape, [2], the tensor should have 2 values but has 1\n at gv (https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js:17:116089)\n at iy (https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js:17:170406)\n at ay (https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js:17:169961)\n at yw (https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js:17:228810)\n at t. (https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js:17:699318)\n at u (https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js:17:104674)\n at Generator._invoke (https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js:17:104427)\n at forEach.t. [as next] (https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js:17:105031)\n at Wm (https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js:17:109975)\n at o (https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js:17:110179)" } Bug Description Based on the provided shape, [2], the tensor should have 2 values but has 1. However, the same model executes perfectly in TensorFlow. Is there any tensor data lost during TensorFlow.js execution or converting? Maybe there are some faults in the implementation of the operator ReduceMean? Similar faults are also reproduced under 6 other operators(e.g, Convolution, Padding, etc). This tensor data lost is triggered occasionally, which depends on condition of the cache in the browser. Maybe there are some faults in the cache reuse mechanism of the browser(e.g., latent refreshing)?

gaikwadrahul8 commented 1 month ago

Hi, @DeepLearningMOSA

Thank you for bringing this issue to our attention and To help us investigate further, if it's convenient, would you be able to share your GitHub repository or a code snippet containing the model (zipped format)? Additionally, if possible, could you please outline the steps to reproduce the behavior on our end? This information would greatly expedite our troubleshooting process.

Thank you for your cooperation and patience.

DeepLearningMOSA commented 1 month ago

The info for the model structure has already been given before, and the source code running on TensorFlow.js is in the attached HTML file. A sample of the model format is also in the zip. We use the tfjs-converter to convert the model from TensorFlow to TensorFlow.js. The converted model runs in the browser. These issues frequently occur when the browser cache is not disabled and occasionally occur when the browser cache is disabled. Our preliminary analysis suggests that these bugs are related to the browser cache reuse mechanism. The reuse of the browser cache causes disturbances in the browser environment, leading to these bugs. We hope the developers can provide the necessary explanations and analysis. bug report.zip

DeepLearningMOSA commented 1 month ago

Any further confirm?

gaikwadrahul8 commented 1 month ago

Hi, @DeepLearningMOSA

I apologize for the delay in my response, I checked your provided .zip file with TensorFlow SavedModel format after that I converted your provided model into TensorFlow.js format here is gist-file for reference, after converting the TensorFlow SavedModel format to TensorFlow.js and model.json looks like below :

{"format": "graph-model", "generatedBy": "2.9.0", "convertedBy": "TensorFlow.js Converter v4.20.0", "signature": {"inputs": {"input_1": {"name": "input_1:0", "dtype": "DT_FLOAT", "tensorShape": {"dim": [{"size": "-1"}, {"size": "48"}, {"size": "48"}, {"size": "3"}]}}}, "outputs": {"output_1": {"name": "Identity:0", "dtype": "DT_FLOAT", "tensorShape": {"dim": [{"size": "-1"}, {"size": "48"}, {"size": "48"}, {"size": "3"}]}}}}, "modelTopology": {"node": [{"name": "input_1", "op": "Placeholder", "attr": {"shape": {"shape": {"dim": [{"size": "-1"}, {"size": "48"}, {"size": "48"}, {"size": "3"}]}}, "dtype": {"type": "DT_FLOAT"}}}, {"name": "Identity", "op": "Identity", "input": ["input_1"], "attr": {"T": {"type": "DT_FLOAT"}}}], "library": {}, "versions": {"producer": 1766}}, "weightsManifest": [{"paths": [], "weights": []}]}

After that could you please guide me with complete steps to replicate the same behavior from our end to investigate this issue further ?

Thank you for your cooperation and patience.

DeepLearningMOSA commented 1 month ago

conda create -n DLMOSA python=3.9 source activate DLMOSA pip install tensorflow==2.9.0 pip install torch==1.12.0 pip install keras==2.6.0 pip install sqlalchemy==1.4.32 pip install mysql-connector-python pip install flask==2.2.2 pip install flask-cors==3.0.10 pip install gevent==22.10.2 pip install tensorflow-estimator==2.9.0 pip install tensorflow-hub==0.12.0 pip install tensorflowjs==3.19.0 pip install Werkzeug==2.2.2

Create a TensorFlow model according to the given model structure above and convert it to a TensorFlow.js model. Use the attached HTML file to execute the converted TensorFlow.js model.