tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.51k stars 1.94k forks source link

Bug: Unable to create WebGLTexture #8423

Open nickls opened 1 month ago

nickls commented 1 month ago

System information

Describe the current behavior We're running a TF.js model in production that is a fine tuned MobileNetv1. This model works perfectly for all of our users except one, we are unable to reproduce the issue locally or detect the issue before it occurs so we could switch to CPU. This issue started about a month ago, during which time we had not updated any of our TF code or components.

Problem:

You can see the stackstrace for when the system going into a loop (also attached) image

Describe the expected behavior

Here is what the stackstrace looks like when the model successfully loads and warms. image

Standalone code to reproduce the issue We cannot reproduce on local systems. But are open to any ideas on how to reproduce the problem.

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

tf.ENV.features:

{
  "IS_BROWSER": true,
  "IS_NODE": false,
  "DEBUG": false,
  "CPU_HANDOFF_SIZE_THRESHOLD": 128,
  "CANVAS2D_WILL_READ_FREQUENTLY_FOR_GPU": false,
  "IS_SAFARI": false,
  "IS_TEST": false,
  "SOFTWARE_WEBGL_ENABLED": false,
  "WEBGL_VERSION": 2,
  "HAS_WEBGL": true,
  "WEBGL_CHECK_NUMERICAL_PROBLEMS": false,
  "WEBGL_CPU_FORWARD": true,
  "WEBGL_PACK": true,
  "WEBGL_PACK_UNARY_OPERATIONS": true,
  "WEBGL_USE_SHAPES_UNIFORMS": false,
  "WEBGL_FORCE_F16_TEXTURES": false,
  "WEBGL_RENDER_FLOAT32_CAPABLE": true,
  "WEBGL_RENDER_FLOAT32_ENABLED": true,
  "WEBGL_SIZE_UPLOAD_UNIFORM": 4,
  "WEBGL_MAX_TEXTURE_SIZE": 16384,
  "WEBGL_MAX_SIZE_FOR_NARROW_TEXTURE": null,
  "WEBGL_AUTO_SQUARIFY_NARROW_TEXTURE_SHAPE": false,
  "WEBGL_ISNAN_CUSTOM": true,
  "ENGINE_COMPILE_ONLY_ON_DEMAND": true,
  "WEBGL_FLUSH_THRESHOLD": -1,
  "WEBGL_LAZILY_UNPACK": true,
  "WEBGL_BUFFER_SUPPORTED": true,
  "WEBGL_FENCE_API_ENABLED": true,
  "WEBGL_DELETE_TEXTURE_THRESHOLD": -1,
  "USE_SETTIMEOUTCUSTOM": false
}

We wrote a TF testing page to help isolate the issue, screen shots are below. These tests all pass for our dev and QA team, but running the model fails for our user.

screenshot_1 screenshot_2

Trace-20240930T110703.json.zip

nickls commented 1 month ago

cc: @kevinwoolfolk97

nickls commented 1 month ago

Our loading and warming code:

model = await loadGraphModel(this.customModelPath);
...
tf.tidy(() => {
  const results = this.model.predict(
    tf
      .zeros([Detector.IMG_SIZE, Detector.IMG_SIZE, 3], "float32")
      .expandDims(0)
  );

  results.data().then(() => {
    this.setModelState("warmed");
  });
});
nickls commented 3 weeks ago

@shmishra99 -- Any ideas on this issue? or anything we can do to help debug it?