tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.38k stars 1.92k forks source link

tfjs-automl/demo/object_detection gives no predictions #3858

Closed Download closed 1 year ago

Download commented 4 years ago

TensorFlow.js version

Not sure... is that the same as the version of tfjs-core that is used? If I check the tfjs-automl package.json, it looks like it is using tfjs-core 1.2.8, but that seems strange as tfjs is at 2.x right? Then again if I check the commits it looks like tfjs-automl was never upgraded to 2.x?

Browser version

Does not seem relevant. Latest Chrome.

Describe the problem or feature request

Predictions always return an empty array.

The reason I cloned this repo and tried the object_detection demo is that this was exactly what I was seeing on my own model... Simply no predictions are returned, just an empty array. So I tried to replace my model with a 'known good' one from Google itself, but still the same result. So, thinking I might have some error in my code somewhere, I decided to try the official example in this repo, giving the same result. Empty array...

Code to reproduce the bug / link to feature request

Clone this repo (I am assuming you cloned to C:\ws\tfjs)
In a shell window, browse to C:\ws\tfjs\tfjs-automl\demo\object_detection
yarn
C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn
yarn install v1.12.3
[1/4] Resolving packages...
[2/4] Fetching packages...
info fsevents@1.2.9: The platform "win32" is incompatible with this module.
info "fsevents@1.2.9" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
[4/4] Building fresh packages...
Done in 34.04s.
yarn watch
C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn watch
yarn run v1.12.3
$ cross-env NODE_ENV=development parcel index.html --no-hmr --open
Server running at http://localhost:1234
| Building index.html...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

/ Building index.js...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

/ Building index.js...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

√  Built in 13.89s.
Open your browser

Open your browser at http://localhost:1234

Empty array is returned

After you open the page, it takes a few seconds in which inference is running. After that, it should print a JSON with the returned results below the image and draw some boxes on top of the image containing the detected objects, but instead it only prints [] and draws no boxes at all.

tafsiri commented 4 years ago

@rthadur I wasn't able to reproduce this (i get the expected array and boxes on the screen), are you able to reproduce? The demo depends on already published versions of tfjs and tfjs-automl so the behaviour described is surprising.

Download commented 4 years ago

@tafsiri On what OS do you test? I run the example on Windows. And what steps did you follow? Same as me, just yarn and yarn watch?

Download commented 4 years ago

Are there any ways for me to debug what is happening? Can I enable logging somehow? Or insert some code to check stuff?

Download commented 4 years ago

Just spinned up a Ubuntu 18 VM. Same result.

stijn@DESKTOP-3I7O7CL:~$ git clone git@github.com:tensorflow/tfjs.git
Cloning into 'tfjs'...
Enter passphrase for key '/home/stijn/.ssh/id_rsa':
remote: Enumerating objects: 268, done.
remote: Counting objects: 100% (268/268), done.
remote: Compressing objects: 100% (156/156), done.
remote: Total 49031 (delta 140), reused 194 (delta 106), pack-reused 48763
Receiving objects: 100% (49031/49031), 49.53 MiB | 3.54 MiB/s, done.
Resolving deltas: 100% (38881/38881), done.
Checking out files: 100% (2354/2354), done.
stijn@DESKTOP-3I7O7CL:~$ cd tfjs
stijn@DESKTOP-3I7O7CL:~/tfjs$ cd tfjs-automl/
stijn@DESKTOP-3I7O7CL:~/tfjs/tfjs-automl$ cd demo/
stijn@DESKTOP-3I7O7CL:~/tfjs/tfjs-automl/demo$ cd object_detection/
stijn@DESKTOP-3I7O7CL:~/tfjs/tfjs-automl/demo/object_detection$ yarn
yarn install v1.22.4
[1/4] Resolving packages...
[2/4] Fetching packages...
info fsevents@1.2.9: The platform "linux" is incompatible with this module.
info "fsevents@1.2.9" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
[4/4] Building fresh packages...
warning Your current version of Yarn is out of date. The latest version is "1.22.5", while you're on "1.22.4".
info To upgrade, run the following command:
$ curl --compressed -o- -L https://yarnpkg.com/install.sh | bash
Done in 28.26s.
stijn@DESKTOP-3I7O7CL:~/tfjs/tfjs-automl/demo/object_detection$ yarn watch
yarn run v1.22.4
$ cross-env NODE_ENV=development parcel index.html --no-hmr --open
Server running at http://localhost:1234
⠸ Building index.html...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

⠦ Building tf-automl.esm.js...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

✨  Built in 13.60s.

Results:

On the webpage, below the image, after a few seconds, this appears:

[]
tafsiri commented 4 years ago

@Download I'm on mac os. Debugging apis you could use include https://js.tensorflow.org/api/latest/#enableDebugMode and https://js.tensorflow.org/api/latest/#profile, these would let you see what kernels get run and how many tensors are created.

I'd personally also look into debugging the input into the model to making sure it is as expected. I'd also try things like passing in a random tensor using the advanced api.

cc @dsmilkov for other debugging thoughts.

Download commented 4 years ago

@tafsiri To enable debug mode, would I add

tf.enableDebugMode ()

to tfjs-automl/demo/object_detection/index.js

?

and am I correct that I would need to add an import statement to get tf?

Sorry if this sounds dumb, but could you spell out what code I should add where?

tafsiri commented 4 years ago

yes on both counts (the import and the function call). You would also need to npm install @tensorflow/tfjs.

Though if we can reproduce we'll be in a better position to take a look. Adding a few other folk to this.

Download commented 4 years ago

@tafsiri Thanks. I will try this and let you know the output. Also, I am goingto try it on a different machine (Windows laptop) and see what happens there.

One question still. Did you do the exact same steps as I did? So:

Because I cannot understand how this can give different results??

Download commented 4 years ago

Attempting to reproduce on my Windows laptop

C:\ws>git clone https://github.com/tensorflow/tfjs.git
Cloning into 'tfjs'...
remote: Enumerating objects: 284, done.
remote: Counting objects: 100% (284/284), done.
remote: Compressing objects: 100% (166/166), done.
remote: Total 49047 (delta 147), reused 203 (delta 111), pack-reused 48763
Receiving objects: 100% (49047/49047), 49.54 MiB | 3.54 MiB/s, done.
Resolving deltas: 100% (38888/38888), done.

C:\ws>cd tfjs\tfjs-automl\demo\object_detection

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn
yarn install v1.22.5
[1/4] Resolving packages...
[2/4] Fetching packages...
info fsevents@1.2.9: The platform "win32" is incompatible with this module.
info "fsevents@1.2.9" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
[4/4] Building fresh packages...
Done in 74.76s.

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn watch
yarn run v1.22.5
$ cross-env NODE_ENV=development parcel index.html --no-hmr --open
Server running at http://localhost:1234
| Building index.html...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

- Building index.js...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

/ Building index.js...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

√  Built in 20.58s.

Results

Also here it returns an empty array...

automl-object-detection-results

Conclusion

I have tried this now on 3 different (virtual) machines:

Also I tried now with 2 different versions of Chrome (the one on my laptop was outdated and I did not yet update it) and on Firefox. All give consistently the same result: an empty array.

Also, I noticed other people are also reporting issues about getting empty array as the result. See #3861 So my conclusion is that there really is something broken here.

@tafsiri I am really wondering.... How do you reproduce? Can you tell me the exact steps you followed? Maybe I can try to do it your way. Or did you do exactly the same?

It seems very hard to believe for me that it would give different results. Only thing is I never tried on a Mac... Do you have access to another (non-Mac) machine to eliminate the chance it is related to the OS?

I will add debug info now as per your instructions and see what comes up...

Download commented 4 years ago

Running in debug mode

I tried a few different ways of running the demo with debug mode enabled.

With @tensorflow/tfjs latest

First, I installed @tensorflow/tfjs in the object_detection demo.

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn add @tensorflow/tfjs
yarn add v1.12.3
[1/4] Resolving packages...
[2/4] Fetching packages...
info fsevents@1.2.9: The platform "win32" is incompatible with this module.
info "fsevents@1.2.9" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
warning "@tensorflow/tfjs > @tensorflow/tfjs-data@2.3.0" has unmet peer dependency "seedrandom@~2.4.3".
[4/4] Building fresh packages...
success Saved lockfile.
warning Your current version of Yarn is out of date. The latest version is "1.22.5", while you're on "1.12.3".
info To upgrade, run the following command:
$ curl --compressed -o- -L https://yarnpkg.com/install.sh | bash
success Saved 10 new dependencies.
info Direct dependencies
└─ @tensorflow/tfjs@2.3.0
info All dependencies
├─ @tensorflow/tfjs@2.3.0
├─ @types/color-name@1.1.1
├─ @types/node-fetch@2.5.7
├─ @types/node@14.6.1
├─ asynckit@0.4.0
├─ combined-stream@1.0.8
├─ delayed-stream@1.0.0
├─ form-data@3.0.0
├─ mime-db@1.44.0
└─ mime-types@2.1.27
Done in 104.31s.

Next, I added an import statement and the line to enable debug mode to tfjs-automl\demo\object_detection\index.js:

import * as tf from '@tensorflow/tfjs';          // <-- added this on (empty) line 17
import * as automl from '@tensorflow/tfjs-automl';
tf.enableDebugMode();                            // <-- added this on (empty) line 19
const MODEL_URL =
    'https://storage.googleapis.com/tfjs-testing/tfjs-automl/object_detection/model.json';

Then, I ran yarn watch again:

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn watch
yarn run v1.12.3
$ cross-env NODE_ENV=development parcel index.html --no-hmr --open
Server running at http://localhost:1234
√  Built in 4.67s.

This time, no results are given. The app seems to crash. There is some output in the developer tools console:

engine.ts:229 webgl backend was already registered. Reusing existing backend factory.
registerBackend @ engine.ts:229
engine.ts:229 cpu backend was already registered. Reusing existing backend factory.
registerBackend @ engine.ts:229
environment.ts:55 Platform browser has already been set. Overwriting the platform with [object Object].
setPlatform @ environment.ts:55
flags.ts:27 Debugging mode is ON. The output of every math call will be downloaded to CPU and checked for NaNs. This significantly impacts performance.
(anonymous) @ flags.ts:27
tensor.ts:464 Uncaught (in promise) TypeError: ut(...).registerTensor is not a function
    at new t (tensor.ts:464)
    at Function.t.make (tensor.ts:483)
    at wn (tensor_ops.ts:112)
    at bn (tensor_ops.ts:58)
    at o (io_utils.ts:175)
    at Object.eh [as decodeWeights] (io_utils.ts:116)
    at e.<anonymous> (graph_model.ts:143)
    at tensor.ts:397
    at Object.next (tensor.ts:397)
    at o (tensor.ts:397)

This seems strange.... But now I realize that maybe I install the wrong version of @tensorflow/tfjs?

With @tensorflow/tfjs 1.2.8

So I am trying again, this time trying with @tensorflow/tfjs@1.2.8 (same version as @tensorflow/tfjs-core that the demo was already using) and see if that helps.

I left the code changes described above the same, but uninstalled the latest version of tfjs and installed 1.2.8:

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn remove @tensorflow/tfjs
yarn remove v1.12.3
[1/2] Removing module @tensorflow/tfjs...
[2/2] Regenerating lockfile and installing missing dependencies...
info fsevents@1.2.9: The platform "win32" is incompatible with this module.
info "fsevents@1.2.9" is an optional dependency and failed compatibility check. Excluding it from installation.
success Uninstalled packages.
Done in 11.21s.

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn add @tensorflow/tfjs@1.2.8
yarn add v1.12.3
[1/4] Resolving packages...
[2/4] Fetching packages...
info fsevents@1.2.9: The platform "win32" is incompatible with this module.
info "fsevents@1.2.9" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
warning "@tensorflow/tfjs > @tensorflow/tfjs-data@1.2.8" has unmet peer dependency "seedrandom@~2.4.3".
[4/4] Building fresh packages...

success Saved lockfile.
success Saved 11 new dependencies.
info Direct dependencies
└─ @tensorflow/tfjs@1.2.8
info All dependencies
├─ @tensorflow/tfjs-data@1.2.8
├─ @tensorflow/tfjs-layers@1.2.8
├─ @tensorflow/tfjs@1.2.8
├─ @types/node-fetch@2.5.7
├─ @types/node@14.6.1
├─ asynckit@0.4.0
├─ combined-stream@1.0.8
├─ delayed-stream@1.0.0
├─ form-data@3.0.0
├─ mime-db@1.44.0
└─ mime-types@2.1.27
Done in 17.11s.

C:\ws\tfjs\tfjs-automl\demo\object_detection>yarn watch
yarn run v1.12.3
$ cross-env NODE_ENV=development parcel index.html --no-hmr --open
Server running at http://localhost:1234
\ Building index.html...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

- Building tf-data.esm.js...Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`

WARNING: We noticed you're using the `useBuiltIns` option without declaring a core-js version. Currently, we assume version 2.x when no version is passed. Since this default version will likely change in future versions of Babel, we recommend explicitly setting the core-js version you are using via the `corejs` option.

You should also be sure that the version you pass to the `corejs` option matches the version specified in your `package.json`'s `dependencies` section. If it doesn't, you need to run one of the following commands:

  npm install --save core-js@2    npm install --save core-js@3
  yarn add core-js@2              yarn add core-js@3

√  Built in 14.78s.

Again, the app crashes. The logging printed to the developer console is slightly different:

flags.ts:27 Debugging mode is ON. The output of every math call will be downloaded to CPU and checked for NaNs. This significantly impacts performance.
(anonymous) @ flags.ts:27
t.set @ environment.ts:104
Fe @ globals.ts:49
parcelRequire.index.js.@tensorflow/tfjs @ index.js:19
newRequire @ object_detection.e31bb0bc.js:49
(anonymous) @ object_detection.e31bb0bc.js:81
(anonymous) @ object_detection.e31bb0bc.js:107
util.ts:107 Uncaught (in promise) Error: Element arr[0] should be a primitive, but is an array of 0 elements
    at f (util.ts:107)
    at t (tensor_util_env.ts:56)
    at t (tensor_util_env.ts:66)
    at en (tensor_util_env.ts:46)
    at bn (tensor_ops.ts:57)
    at o (io_utils.ts:175)
    at Object.eh [as decodeWeights] (io_utils.ts:116)
    at e.<anonymous> (graph_model.ts:143)
    at callbacks.ts:256
    at Object.next (callbacks.ts:256)
f @ util.ts:107
t @ tensor_util_env.ts:56
t @ tensor_util_env.ts:66
en @ tensor_util_env.ts:46
bn @ tensor_ops.ts:57
o @ io_utils.ts:175
eh @ io_utils.ts:116
(anonymous) @ graph_model.ts:143
(anonymous) @ callbacks.ts:256
(anonymous) @ callbacks.ts:256
o @ callbacks.ts:256
async function (async)
run @ index.js:24
parcelRequire.index.js.@tensorflow/tfjs @ index.js:71
newRequire @ object_detection.e31bb0bc.js:49
(anonymous) @ object_detection.e31bb0bc.js:81
(anonymous) @ object_detection.e31bb0bc.js:107

So I am running out of ideas here.

@tafsiri Is there something else I can try? And can you elaborate on how you tried to reproduce? Same steps? Am I maybe doing something wrong in my attempts to debug? Should I import tfjs after I import automl i.s.o. before?

EDIT

I figured out that actually, I don't have to install @tensorflow/tfjs after all, because enableDebugMode is exported from @tensorflow/tfjs-core, which was already installed. This allows me to not change the package.json. So I now tried with the original dependencies:

tfjs-automl/demo/object_detection/package.json

{
  "dependencies": {
    "@tensorflow/tfjs-automl": "^1.0.0",
    "@tensorflow/tfjs-converter": "^1.2.8",
    "@tensorflow/tfjs-core": "^1.2.8"
  }
}

I added the import for tfjs-core and called enableDebugMode. I also added some logging:

tfjs-automl/demo/object_detection/index.js

import { enableDebugMode } from '@tensorflow/tfjs-core';  // <-- import
enableDebugMode();                                        // <-- call
import * as automl from '@tensorflow/tfjs-automl';
const MODEL_URL =
    'https://storage.googleapis.com/tfjs-testing/tfjs-automl/object_detection/model.json';

async function run() {
  console.info('loading model');                          // <-- logging
  const model = await automl.loadObjectDetection(MODEL_URL);
  const image = document.getElementById('salad');
  // These are the default options.
  const options = {score: 0.5, iou: 0.5, topk: 20};
  console.info('running predictions');                    // <-- logging
  const predictions = await model.detect(image, options);
  console.info('predictions', predictions);               // <-- logging

Results

Nothing is returned. The program crashes with these messages in the console:

Debugging mode is ON. The output of every math call will be downloaded to CPU and checked for NaNs. This significantly impacts performance. flags.ts:27:12
loading model index.js:24:10
Uncaught (in promise) Error: Element arr[0] should be a primitive, but is an array of 0 elements
    f util.ts:107
    t tensor_util_env.ts:57
    t tensor_util_env.ts:66
    en tensor_util_env.ts:40
    bn tensor_ops.ts:57
    o io_utils.ts:175
    eh io_utils.ts:116
    load graph_model.ts:143
    p object_detection.e31bb0bc.js:19410
    p object_detection.e31bb0bc.js:19422
    o object_detection.e31bb0bc.js:19311

If I comment out the call to enableDebugMode, I get empty array again and these messages in the console:

loading model index.js:24:10
running predictions index.js:29:10
predictions 
Array []
index.js:31:10
Download commented 4 years ago

Codepen demonstrating the issue

I took the sample code from this documentation page about object detection and put it in this codepen. Same result. Empty array.

Oh one thing: I replaced the url of the model to load with the one from the automl/demo/object_detection example because the script on that documentation page is trying to load from a local url which is not on codepen.

Download commented 4 years ago

This zip file also demonstrates the issue on my machine. Here I have downloaded the model files to a folder with an index.html with the script from the documentation page.

object_detection.zip

Start an http-server on that folder as per the instructions on the page:

C:\ws\object_detection>http-server -p 8000
Starting up http-server, serving ./
Available on:
  http://192.168.2.12:8000
  http://127.0.0.1:8000
Hit CTRL-C to stop the server
tafsiri commented 4 years ago

@Download thanks for the codepen link. I tried it and still get working results.

Screen Shot 2020-08-29 at 8 51 30 PM

I did modiy your codepen a bit to switch the backend to CPU. https://codepen.io/tafsiri/pen/zYqzaKr

Could you try that and let us know if it works (it might take a bit longer to return an answer). If so then it is probably a WebGL bug of some sort.

Download commented 4 years ago

Yes! It works!

@tafsiri Thank you! I have now seen working predictions on my machine for the first time. Finally I have a way forward. You really made my day buddy! I am going to implement using the CPU for now. Later on I might add some code that attempts to do predictions using the GPU and if it succeeds, switch the backend back to GPU for those devices where it works.

I have a consistently reproducing scenario now for this issue with the GPU backend, so if you want me to try out some stuff to narrow down the issue, just let me know. You can reach me at stijndewitt AT gmail DOT com.

tafsiri commented 4 years ago

Okay that narrows it down a bit. Could you screenshot what you see here https://js.tensorflow.org/debug/ and add it to this issue.

Also are you able to get us info on what graphics card/chipset you are running? I notice you mention you are using virtual machines, if your setup prevents the VM from accessing the graphics card you will not be able to to use the WebGL backend. Have you tried this outside of a VM?

Download commented 4 years ago

@tafsiri

Could you screenshot what you see here https://js.tensorflow.org/debug/ and add it to this issue.

image

Also are you able to get us info on what graphics card/chipset you are running?

According to Windows Device Manager it is "Intel(R) HD Graphics 4600" That is my desktop. I can check my laptop as well if it is useful. And maybe you have/know some WebGL test page that I can screenshot for more details?

Have you tried this outside of a VM? Yes, I tried on:

tafsiri commented 4 years ago

@rthadur Would you be able to try and reproduce this on windows (using the codepen link)?

@annxingyuan Any ideas of other things to check that would explain getting no results on WebGL but getting results on CPU?

annxingyuan commented 4 years ago

@tafsiri Hmm, running the app in debug mode would be the best way to check, but it looks like it gets stuck on checking for shape consistency - I'm wondering whether the same error occurs in debug mode on the CPU backend? @Download - any chance you still have things set up to run in debug mode and could easily check whether you get the same shape consistency error on the CPU backend?

tafsiri commented 4 years ago

@annxingyuan i believe there is no error on cpu.

tafsiri commented 4 years ago

nm misunderstood what you meant. Will chat with you offline.

rthadur commented 4 years ago

@tafsiri tried in windows on a loaner laptop , it works well with CPU and WebGL backend , i tried using this codepen example https://codepen.io/tafsiri/pen/zYqzaKr

Download commented 4 years ago

@annxingyuan If with the 'shape consistency error' you are talking about Element arr[0] should be a primitive, but is an array of 0 elements, I get that on the GPU backend as well as on the CPU backend. I updated this codepen so it shows that:

https://codepen.io/StijnDeWitt/pen/poywgJV

tf.setBackend('cpu')
tf.enableDebugMode()

Result

Uncaught (in promise) Error: Element arr[0] should be a primitive, but is an array of 0 elements
    at gv (tfjs:17)
    at t (tfjs:17)
    at t (tfjs:17)
    at Vg (tfjs:17)
    at Gy (tfjs:17)
    at mN (tfjs:17)
    at t.e.loadSync (tfjs:17)
    at t.<anonymous> (tfjs:17)
    at u (tfjs:17)
    at Generator._invoke (tfjs:17)
gv @ tfjs:17
t @ tfjs:17
t @ tfjs:17
Vg @ tfjs:17
Gy @ tfjs:17
mN @ tfjs:17
e.loadSync @ tfjs:17
(anonymous) @ tfjs:17
u @ tfjs:17
(anonymous) @ tfjs:17
forEach.t.<computed> @ tfjs:17
Wm @ tfjs:17
o @ tfjs:17
async function (async)
run @ index.html?key=iFrameKey-15e063db-a94a-c56c-9648-913b5caf6c6b:28
(anonymous) @ index.html?key=iFrameKey-15e063db-a94a-c56c-9648-913b5caf6c6b:38

I think the fact that enabling debug mode gives this error is an indication that it does not work completely correctly.

@rthadur When you add tf.enableDebugMode() to your codepen, does it still work?

rthadur commented 4 years ago

@Download yes it worked!

Download commented 4 years ago

@rthadur He he I'm not sure what you mean by 'it woked'... As in you get no error? Or as in yes you can reproduce now?

I have two machines here exhibiting the problem. Admittedly they are both old machines with integrated graphics... Maybe you can try to add some extra debug log statements to the library just before the point where the error is thrown in my stacktrace above and expose that test version as a codepen or something and I can run it on one of those machines and let you know the results?

Or maybe you have some other idea to try and narrow this down?

tafsiri commented 4 years ago

We narrowed it down a little bit, there may be 2 bugs. To get around the first one could you move the call to tf.enableDebugMode(); to after the model has loaded. This will avoid the shape consistency check issue. I've done this in my codepen so you can try running that.

We suspect somewhere in the pipeline, NaNs (or lots of zeroes) are being produced, debugMode will check operations for NaNs and report them on the console. Let us know what you see printed out. Thanks.

tafsiri commented 4 years ago

Also if for some reason you still get the shape check errors after moving enableDebugMode below the model load code, you can try adding tf.env().set('TENSORLIKE_CHECK_SHAPE_CONSISTENCY', false) as the first line of the program.

Download commented 4 years ago

@tafsiri I ran your codepen. It does not crash with the shape consistency error but still gives an empty array. It printed a looot of logging to the console, attached below:

tensorflow-issue-3858-logging.txt

tafsiri commented 4 years ago

cc @annxingyuan see linked profile above, no evidence of NaNs.

@Download one more suggestion. Could you add tf.ENV.set('WEBGL_PACK', false) to the top of the program (or run my codepen again as I've added that there) and also upload the logs from that.

Download commented 4 years ago

Hi @tafsiri Ran your codepen again. This time it gives predictions:

[
  {
    "box": {
      "left": -2.6272237300872803,
      "top": 7.801450788974762,
      "width": 309.0912103652954,
      "height": 276.35952830314636
    },
    "label": "Salad",
    "score": 0.9568929672241211
  },
  {
    "box": {
      "left": 104.73532229661942,
      "top": 25.768655352294445,
      "width": 73.24516028165817,
      "height": 52.02250275760889
    },
    "label": "Tomato",
    "score": 0.85658860206604
  }
]

I guess that's good news right?

tensorflow-issue-3858-logging-webgl-pack-false.txt

Nothing to do with the issue I guess, but I notice a negative value for the left field of the first box. Is that normal?

tafsiri commented 4 years ago

It is good news and I think does confirm that your hardware probably doesn't support WebGL as well as would be needed to execute this model under our default settings. Workarounds for those kinds of issues are quite difficult to do unless we can locally reproduce. Feel free to leave tf.env().set('WEBGL_PACK', false) in your code as you test with your actual use case (there may still be accuracy issues).

We did want to suggest trying out the WASM backend, it is generally much faster than the standard CPU backend (sometimes as fast as WebGL), and may be more consistent if you anticipate deploying to older hardware.

Thanks for your patience and sending along debug info.

google-ml-butler[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 dyas if no further activity occurs. Thank you.

google-ml-butler[bot] commented 3 years ago

Closing as stale. Please @mention us if this needs more attention.

Download commented 3 years ago

@tafsiri I don't think this issue has actually been resolved right? Are you guys planning on improving the default WebGL-based backend? My project ended so I haven't been working on it anymore. So I did not test the WASM backend. There are multiple reports about object detection giving no results and it eats up developer time when it happens. It took me 3 full days of investigation before I could resolve it. That's allmost $ 2,500 of wasted money for my client.

I guess it is actually worse if it does work on the development machine... because then you will end up deploying to production and a certain percentage of users will just get no results and the devs will be scratching their heads with no error message or any other leads to what is happening and no way to reproduce. In it's current state I would never use the WebGL engine in production for that reason. It should at least print some error message that the hardware is not up to par.

I have invested a lot of time in this issue. Running reproduction scenario's, providing debug logs and what not. And to then see the issue being closed as stale, even though the problem still exists, is a bit painful. I understand you need to get the issue off of your work list, but the only real way of doing that is investigating and solving it.

Oh, one more thing. I know this issue is hardware related, but both machines that I own have this issue and they are using standard Intel on-board graphics. I am betting that it's actually a significant percentage of users out there that use similar hardware. Since this stuff runs on the client, it will fail on the client machine. Without any error message. How comfortable would you be deploying a solution that will fail on x percent of user's machines without error messages or anything? Will those users end up calling your support desk? How much time will you end up spending on answering calls and investigations before eventually you realize that the only real fix is to switch to the CPU backend? If you, as Tensorflow developers, feel it is too hard to fix this issue, imagine how much harder it is for developers that know nothing about Tensorflow? It is nigh on impossible.

In my mind, having this bug in it makes the WebGL backend worthless because I cannot use it in production. So it seems to me, for that reason, it is worthwhile to fix.

annxingyuan commented 3 years ago

@Download Hello, yes you are right - this issue should not be closed. Our issues get automatically closed after a few days so thank you for the nudge. I also sincerely apologize for the experience you had. We will do our best to resolve this issue.

gaikwadrahul8 commented 1 year ago

Hi, @Download

Apologize for the delayed response and We're re-visiting our older issues and checking whether those issues got resolved or not so May I know are you still looking for solution or your issue got resolved please ?

I tried to replicate the same issue from my end with latest version of @tensorflow/tfjs@4.6.0 and it's working as expected if I'm not wrong, here is codepen link so could you please try it from your end and see whether issue still persists or not ? if have I missed something here please let me know ?

Here is output of Codepen link :

image

[
  {
    "box": {
      "left": 105.10694980621338,
      "top": 22.127436473965645,
      "width": 70.60015201568604,
      "height": 55.69593608379364
    },
    "label": "Tomato",
    "score": 0.9720228910446167
  },
  {
    "box": {
      "left": 257.72780179977417,
      "top": 90.34746140241623,
      "width": 52.33585834503174,
      "height": 60.38908660411835
    },
    "label": "Tomato",
    "score": 0.934944212436676
  },
  {
    "box": {
      "left": -12.947022914886475,
      "top": -0.08298084139823914,
      "width": 487.4346852302551,
      "height": 242.75248870253563
    },
    "label": "Salad",
    "score": 0.9079410433769226
  }
]

If issue still persists please let us know with error log so we'll dig more into this issue and Could you please confirm if this issue is resolved for you ? Please feel free to close the issue if it is resolved ? Thank you!

Download commented 1 year ago

I have no idea, sorry. I no longer have access to the machine I used back then. The codepen works on my Macbook, for what it's worth.

gaikwadrahul8 commented 1 year ago

Hi, @Download

Thank you for the confirmation so Could you please confirm if this issue is resolved for you ? Please feel free to close the issue if it is resolved ? Thank you!

github-actions[bot] commented 1 year ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 1 year ago

This issue was closed due to lack of activity after being marked stale for past 7 days.

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No