tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.34k stars 1.92k forks source link

TF python vs TFJS models generating significantly different results #8025

Open danielgoldelman opened 10 months ago

danielgoldelman commented 10 months ago

System information

Describe the current behavior This model sees a significant drop in performance between a Tensorflow in Python TFBertForSequenceClassification model and a related Tensorflow in JS Graph Model, created using tensorflow_converter.

Describe the expected behavior The expected behavior would be that the TF python model and the TF js model would produce similar or identical results. Please let us know if the conversion was done incorrectly.

Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/CodePen/any notebook.

Converter scripts:

! pip3 install transformers
! pip3 install tensorflowjs_converter
import tensorflow as tf
from transformers import TFAutoModel

MODEL_NAME = './MultitaskModel'
model = TFAutoModel.from_pretrained(MODEL_NAME, from_pt = True)

callable = tf.function(model.call)
concrete_function = callable.get_concrete_function([tf.TensorSpec([None, 384], tf.int32, name="input_ids"),tf.TensorSpec([None, 384], tf.int32, name="attention_mask")])

tf.saved_model.save(model, 'multitaskModelForJS', signatures=concrete_function)
!saved_model_cli show --dir multitaskModelForJS --tag_set serve --signature_def serving_default

! tensorflowjs_converter \
    --input_format=tf_saved_model \
    --output_format=tfjs_graph_model \
    --signature_name=serving_default \
    --saved_model_tags=serve \
    ./multitaskModelForJS/ \
    ./multitaskModelForJSWeb/

Model location: https://github.com/danielgoldelman/modelrepo/tree/main/MultitaskModel

Results for the Python model:

Screenshot 2023-10-19 at 8 16 15 PM

Results for the JS model (tested both in node and in the browser)

Screenshot 2023-10-19 at 8 16 03 PM
gaikwadrahul8 commented 10 months ago

Hi, @danielgoldelman

Thank you for bringing this issue to our attention and I was trying to replicate the same issue from my end if possible, could you please help me with code-snippet/code-example which you're using to display above results for python model and Tensorflow.jsmodel so I'll try to replicate the same issue from my end also ? Thank you!

danielgoldelman commented 10 months ago

Hello @gaikwadrahul8, here is a link to a repo that can display these results. The results shared in the screenshots above can be generated via the test.ipynb jupyter notebook. Please let me know if there is other information you would like me to supply.

https://github.com/danielgoldelman/tfjs_conv_issue

mattsoulanille commented 10 months ago

I compared the CPU execution vs WebGL (you can reproduce this by running npx http-server --cors="Access-Control-Allow-Origin: *, Access-Control-Allow-Private-Network: true" in the tfjs model directory and clicking the run button). CPU and WebGL results appear identical, so this probably isn't a GPU rounding issue. Maybe it's a model conversion issue, but your call to tfjs-converter looks correct.

You could try converting the model without grappler graph optimization. This shouldn't be the problem, but it's possible something went wrong during optimization. In any case, I would probably do this before trying to compare intermediate tensors between the TF version and TFJS version. Unfortunately, we don't have a flag for this yet, but if you comment out these lines and replace them with return graph_def, you should be able to run the modified converter with npx bazel run //tfjs-converter/python/tensorflowjs/converters:converter -- --help.

danielgoldelman commented 10 months ago

@mattsoulanille Sorry, but I would like some help understanding where to place the BUILD file in the tfjs filestructure. I am getting this error:

ERROR: Skipping '//tfjs-converter/python/tensorflowjs/converters:converter': no such package 'tfjs-converter/python/tensorflowjs/converters': BUILD file not found in any of the following directories. Add a BUILD file to a directory to mark it as a package.
 - /Users/danielgoldelman/Desktop/tfjs_conv_issue/tfjs-converter/python/tensorflowjs/converters
WARNING: Target pattern parsing failed.
ERROR: no such package 'tfjs-converter/python/tensorflowjs/converters': BUILD file not found in any of the following directories. Add a BUILD file to a directory to mark it as a package.
 - /Users/danielgoldelman/Desktop/tfjs_conv_issue/tfjs-converter/python/tensorflowjs/converters
INFO: Elapsed time: 0.056s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
ERROR: Build failed. Not running target

Could you help me get on track?

dadak-dom commented 3 weeks ago

Hello, I have been working on the same task as @danielgoldelman (same model, setup, etc.), and I seem to be running into a similar issue. I've tried @mattsoulanille 's suggestion to disable optimization, but that didn't seem to have any impact on the performance. Here's the command that I used, in case I did anything wrong here:

npx bazel run //tfjs-converter/python/tensorflowjs/converters:converter -- --input_format=tf_saved_model --output_format=tfjs_graph_model --signature_name=serving_default --saved_model_tags=serve /home/dominik/ptl/tfjs/multitaskModelForJS /home/dominik/ptl/tfjs/multitaskModelForJSWeb

I'm not really sure where to go from here, does anybody have any suggestions?