tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.35k stars 1.92k forks source link

[tfjs-tflite] Loading second TFLite model hangs in _emscripten_futex_wait #6094

Open reuben opened 2 years ago

reuben commented 2 years ago

System information

<!-- Import @tensorflow/tfjs-core -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script>
<!-- Adds the CPU backend -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-cpu"></script>
<!--
Import @tensorflow/tfjs-tflite
Note that we need to explicitly load dist/tf-tflite.min.js so that it can
locate WASM module files from their default location (dist/).
-->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-tflite/dist/tf-tflite.min.js"></script>

Describe the current behavior

Trying to load two TFLite models causes the browser to hang on the second tflite.loadTFLiteModel call.

Describe the expected behavior

Being able to load more than a single model at a time.

Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/CodePen/any notebook.

I don't have an easy way to share the model files but here's the entire script: https://gist.github.com/reuben/4d1a77251ad73629f54429705ab022a9

Copy two TFLite model files to a folder and name one of them featurizer.tflite and the other output_graph.tflite, then open taht page, click the model picker and select both files. The script loops forever inside _emscripten_futex_wait on the second tflite.loadTFLiteModel call.

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

josephrocca commented 1 year ago

I'm seeing this same issue - I can only load one tflite model. The hang happens at model init, and only if COOP/COEP headers are set. Here's a minimal reproduction - you just need to serve it with COOP/COEP headers:

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@3.20.0/dist/tf.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-tflite@0.0.1-alpha.9/dist/tf-tflite.min.js"></script>

<script type="module">
  console.log("Attempting to load models...");
  let model1 = await tflite.loadTFLiteModel("https://huggingface.co/rocca/lyra-v2-soundstream/resolve/main/tflite/soundstream_encoder.tflite");
  let model2 = await tflite.loadTFLiteModel("https://huggingface.co/rocca/lyra-v2-soundstream/resolve/main/tflite/lyragan.tflite");
  console.log("Loaded models.")
</script>

It loads fine if you comment out one of the loadTFLiteModel calls.

I used this tfjs-tflite build by @jinjingforever and paused the execution in DevTools to see the stack trace:

emscripten DevTools stack trace, looping within emscripten_futex_wait

Note: If the page isn't served with COOP/COEP headers, then both models seem to load without errors: https://jsbin.com/pahaxaromo/edit?html,output but these models use threads during inference, and so an error will be thrown when trying to use the models for prediction.

shmishra99 commented 1 year ago

Hi @reuben , Apology for the late response. I've replicated this issue with latest version of @tensorflow/tfjs@4.4.0. It seems like it is working as expected. For your reference I have added the code snippet below. Code:

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@3.20.0/dist/tf.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-tflite@0.0.1-alpha.9/dist/tf-tflite.min.js"></script>

<script type="module">
  async function x(){
  console.log("Attempting to load models...");
  let model1 = await tflite.loadTFLiteModel( 'https://tfhub.dev/tensorflow/lite-model/mobilenet_v2_1.0_224/1/metadata/1');
  console.log("Loaded model1.")
  console.log("Input shape(s):", model1.inputs.map(input => input.shape));
  console.log("Output shape(s):", model1.outputs.map(output => output.shape));
  let model2 = await tflite.loadTFLiteModel( "https://tfhub.dev/google/lite-model/imagenet/mobilenet_v3_small_075_224/classification/5/metadata/1");
  console.log("Loaded model2.")
  console.log("Loaded model2.");
  console.log("Input shape(s):", model2.inputs.map(input => input.shape));
  console.log("Output shape(s):", model2.outputs.map(output => output.shape));

  }
  x()
</script>

Output:

image

I have loaded two TFLite models in the same code flow and they have loaded perfectly. Let me know if have I missed something here. Thank You !

reuben commented 1 year ago

I'm no longer working on this project, so I can't verify with the original models.

shmishra99 commented 1 year ago

Alright, thank you for the confirmation. It seems like this issue has been resolved with the latest version of @tensorflow/tfjs@4.4.0. Do you need any further help with this issue, or could you please confirm if it has been resolved for you? Please feel free to close the issue if it is resolved. Thank you!

google-ml-butler[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you.

josephrocca commented 1 year ago

Hi @shmishra99 I can still replicate this problem in tfjs v4.5.0:

https://josephrocca.github.io/lyra-v2-soundstream-web/tflite-simple.html

Important to note that threads must be enabled to replicate this bug. That means you must serve the file with COOP/COEP headers. In the above tflite-simple.html I use a ServiceWorker hack to do this, since Github Pages doesn't allow setting headers the proper way.

There is no problem with the single-threaded tfjs-lite runtime, which I'm guessing is what you used in your recent tests.

@google-ml-butler Please remove stale tag.

reuben commented 1 year ago

Commenting here as issue opener to make sure the last comment doesn't fall through the automated cracks.

google-ml-butler[bot] commented 1 year ago

Closing as stale. Please @mention us if this needs more attention.

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No

reuben commented 1 year ago

@google-ml-butler not stale, steps to repro in https://github.com/tensorflow/tfjs/issues/6094#issuecomment-1547544217

reuben commented 1 year ago

@shmishra99 FYI someone provided reproduction steps with TFJS v4.5.0

arcusfelis commented 7 months ago

still there, in electron second load will freeze everything.

await tflite.loadTFLiteModel(model1);
// will give you a model

tflite.loadTFLiteModel(model2);
//will freeze the js console and the whole process with 100% CPU usage

The same behaviour with just loading the same model twice:

await tflite.loadTFLiteModel(model1);
await tflite.loadTFLiteModel(model1);

Version "@tensorflow/tfjs@^4.16.0"

That example from the comment above would also freeze after printing first "Output shape":

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@3.20.0/dist/tf.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-tflite@0.0.1-alpha.9/dist/tf-tflite.min.js"></script>

<script type="module">
  async function x(){
  console.log("Attempting to load models...");
  let model1 = await tflite.loadTFLiteModel( 'https://tfhub.dev/tensorflow/lite-model/mobilenet_v2_1.0_224/1/metadata/1');
  console.log("Loaded model1.")
  console.log("Input shape(s):", model1.inputs.map(input => input.shape));
  console.log("Output shape(s):", model1.outputs.map(output => output.shape));
  let model2 = await tflite.loadTFLiteModel( "https://tfhub.dev/google/lite-model/imagenet/mobilenet_v3_small_075_224/classification/5/metadata/1");
  console.log("Loaded model2.")
  console.log("Loaded model2.");
  console.log("Input shape(s):", model2.inputs.map(input => input.shape));
  console.log("Output shape(s):", model2.outputs.map(output => output.shape));

  }
  x()
</script>
Jove125 commented 7 months ago

Hi, I have the same issue.

I noticed that it depends not so much on the number of initialized models, but on the number of threads specified in the initialization. Even one model can hang if a large number of threads are specified as a parameter.

For example, one of the models hangs during such initialization:

var promise = tflite.loadTFLiteModel(modelPath, {numThreads: 4});
var promise2 = tflite.loadTFLiteModel(modelPath, {numThreads: 4});
var promise3 = tflite.loadTFLiteModel(modelPath, {numThreads: 4});

But such initialization will work stably:

var promise = tflite.loadTFLiteModel(modelPath, {numThreads: 4});
var promise2 = tflite.loadTFLiteModel(modelPath, {numThreads: 4});
var promise3 = tflite.loadTFLiteModel(modelPath, {numThreads: 3});

P.S. Of course, it only hangs if the Cross-Origin Policy settings are specified on the server (multithreaded enabled).