tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.45k stars 1.92k forks source link

Tensor shape mismatch during execution in the browser #8339

Closed DeepLearningMOSA closed 1 month ago

DeepLearningMOSA commented 3 months ago

Some tensor shape are mismatched during execution. System information Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): yes OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 TensorFlow.js installed from (npm or script link): script link TensorFlow.js version (use command below): 3.19.0 Browser version: Microsoft Edge 125.0.2535.67(64 bit)、Google Chrome 125.0.6422.112 (64 bit) Tensorflow.js Converter Version: - Exception Report { "model_inf": "from:0 to:1 operator:identity from:0 to:4 operator:identity from:0 to:8 operator:identity from:0 to:9 operator:identity from:1 to:2 operator:identity from:1 to:7 operator:ReLU from:1 to:9 operator:identity from:2 to:3 operator:ELU from:2 to:4 operator:reducemean from:3 to:4 operator:ELU from:4 to:5 operator:identity from:5 to:6 operator:identity from:5 to:7 operator:PReLU from:6 to:7 operator:identity from:7 to:8 operator:identity from:7 to:9 operator:identity from:8 to:9 operator:identity ", "error_message": "Sizes of tensors must match except in dimension 1. Expected size 48 but got size 1 for tensor number 1 in the list." } Bug Description Expected size 48 but got size 1 for tensor number 1 in the list. However, the same model executes perfectly in TensorFlow. Is there any tensor data lost during TensorFlow.js execution or converting? Maybe there are some faults in the implementation of the operator ReduceMean or PRELU? Similar faults are also reproduced under 3 other operators(e.g, RELU, ELU, etc). This tensor data lost is triggered occasionally, which depends on condition of the cache in the browser. Maybe there are some faults in the cache reuse mechanism of the browser(e.g., latent refreshing)?

gaikwadrahul8 commented 3 months ago

Hi, @DeepLearningMOSA

To help us investigate further, if it's convenient, would you be able to share your GitHub repository or a code snippet containing the model (zipped format)? Additionally, if possible, could you please outline the steps to reproduce the behavior on our end? This information would greatly expedite our troubleshooting process.

Thank you for your cooperation and patience.

DeepLearningMOSA commented 3 months ago

bug report.zip The info for the model structure has already been given before, and the source code running on TensorFlow.js is in the attached HTML file. A sample of the model format is also in the zip. We use the tfjs-converter to convert the model from TensorFlow to TensorFlow.js. The converted model runs in the browser. These issues frequently occur when the browser cache is not disabled and occasionally occur when the browser cache is disabled. Our preliminary analysis suggests that these bugs are related to the browser cache reuse mechanism. The reuse of the browser cache causes disturbances in the browser environment, leading to these bugs. We hope the developers can provide the necessary explanations and analysis. bug report.zip

DeepLearningMOSA commented 2 months ago

Any further confirm?

gaikwadrahul8 commented 2 months ago

Hi, @DeepLearningMOSA

I apologize for the delay in my response, I checked your provided .zip file with TensorFlow SavedModel format after that I converted your provided model into TensorFlow.js format here is gist-file for reference, after converting the TensorFlow SavedModel format to TensorFlow.js and model.json looks like below :

{"format": "graph-model", "generatedBy": "2.9.0", "convertedBy": "TensorFlow.js Converter v4.20.0", "signature": {"inputs": {"input_1": {"name": "input_1:0", "dtype": "DT_FLOAT", "tensorShape": {"dim": [{"size": "-1"}, {"size": "48"}, {"size": "48"}, {"size": "3"}]}}}, "outputs": {"output_1": {"name": "Identity:0", "dtype": "DT_FLOAT", "tensorShape": {"dim": [{"size": "-1"}, {"size": "48"}, {"size": "48"}, {"size": "3"}]}}}}, "modelTopology": {"node": [{"name": "input_1", "op": "Placeholder", "attr": {"shape": {"shape": {"dim": [{"size": "-1"}, {"size": "48"}, {"size": "48"}, {"size": "3"}]}}, "dtype": {"type": "DT_FLOAT"}}}, {"name": "Identity", "op": "Identity", "input": ["input_1"], "attr": {"T": {"type": "DT_FLOAT"}}}], "library": {}, "versions": {"producer": 1766}}, "weightsManifest": [{"paths": [], "weights": []}]}

After that could you please guide me with complete steps to replicate the same behavior from our end to investigate this issue further ?

Thank you for your cooperation and patience.

DeepLearningMOSA commented 2 months ago

conda create -n DLMOSA python=3.9 source activate DLMOSA pip install tensorflow==2.9.0 pip install torch==1.12.0 pip install keras==2.6.0 pip install sqlalchemy==1.4.32 pip install mysql-connector-python pip install flask==2.2.2 pip install flask-cors==3.0.10 pip install gevent==22.10.2 pip install tensorflow-estimator==2.9.0 pip install tensorflow-hub==0.12.0 pip install tensorflowjs==3.19.0 pip install Werkzeug==2.2.2

Create a TensorFlow model according to the given model structure above and convert it to a TensorFlow.js model. Use the attached HTML file to execute the converted TensorFlow.js model.

shmishra99 commented 1 month ago

Hi @DeepLearningMOSA ,

Sorry for the late response. I was trying to replicate your code. I have converted the model file to a TensorFlow.js model.json and provided the converted model path in model_url. However, I'm not sure what the input shape value is that we're getting from const get_url = "http://127.0.0.1:5500/getInput". Instead, I'm using the following code to genrate the input shape and convert to a 4D tensor and predict the output:

let values = new Float32Array(batchSize * height * width * channels).map(() => Math.random());
const shape = [batchSize, height, width, channels];
const xs = tf.tensor4d(values, shape);
execute_model(Model_Url, xs).then(() => {
    console.log("ok");
})

Output

{"content":{"0":0.34755128622055054,"1":0.5571280717849731,"2":0.1893390268087387,"3":0.08901461213827133,"4":0.4620395004749298,"5":0.18775992095470428,"6":0.37909820675849915,"7":0.2286771982908249,"8":0.9785163998603821,"9":0.25268274545669556,"10":0.7539312243461609,"11":0.6315923929214478,"12":0.0570327565073967,"13":0.772369921207428,"14":0.3050304055213928,"15":0.9482877850532532,"16":0.2986413240432739,"17":0.847169816493988,"18":0.9971678853034973,"19":0.8913778066635132,"20":0.5713071227073669,"21":0.6471124887466431,"22":0.974897027015686,"23":0.7384252548217773,"24":0.6985971927642822,"25":0.4109019637107849,"26":0.2080228179693222,"27":0.8166868090629578,"28":0.47893065214157104,"29":0.14292412996292114,"30":0.06956664472818375,"31":0.7647786140441895,"32":0.009139157831668854,"33":0.2962498068809509,"34":0.9590641856193542,"35":0.9082624316215515,"36":0.4696250557899475,"37":0.6361216306686401,"38":0.9207829236984253,"39":0.5989010334014893,"40":0.3135385811328888,"41":0.9649796485900879,"42":0.2746776342391968,"43":0.6981503963470459,"44":0.9323804378509521,"45":0.3070690631866455,"46":0.6723133325576782,"47":0.24599681794643402,"48":0.39941298961639404,"49":0.006291550118476152,"50":0.5446687936782837,"51":0.8880775570869446,"52":0.8123767971992493,"53":0.2992550730705261,"54":0.7882701754570007,"55":0.594056248664856,"56":0.8694619536399841,"57":0.23709231615066528,"58":0.9566893577575684,"59":0.21463310718536377,"60":0.31489282846450806,"61":0.18474267423152924,"62":0.4207715094089508,........

It is predicting the output tensors each time with random input tensors without any issues.

I suspect the issue is with the provided input shape. Could you please check if you are providing the correct input shape in each run and share the input tensor values for the instance where it's throwing an error?

Please advise if my current approach is correct or if I have missed anything.

Thank You!!

github-actions[bot] commented 1 month ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 1 month ago

This issue was closed due to lack of activity after being marked stale for past 7 days.

google-ml-butler[bot] commented 1 month ago

Are you satisfied with the resolution of your issue? Yes No