tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.49k stars 1.93k forks source link

tf.isNaN misbehaves in browsers #7270

Open harangp opened 1 year ago

harangp commented 1 year ago

System information

Describe the current behavior I’ve noticed that the isNaN examples (at least in the 4.2.0 API description) don’t work as expected in the browser: TensorFlow.js API If you press run, the expected behavior would be to return ‘true’ in the first element of the return tensor, but instead, everything is ‘false’:

const x = tf.tensor1d([NaN, Infinity, -Infinity, 0, 1]);
​x.isNaN().print();  // or tf.isNaN(x)
Tensor
    [false, false, false, false, false]

It behaves the same way in my other installed browsers (check above). However, if I start it in Node.JS (v16.14.0), the answers are the ones expected:

Tensor
    [true, false, false, false, false]

Just to test something similar: .isInf() works just fine both in the browser and in Node too, so I suppose something must be wrong with isNaN()

Describe the expected behavior

I would expect that tf.isNaN() is working the same way as it works in Node, returning correct values in the browser:

Tensor
    [true, false, false, false, false]

Standalone code to reproduce the issue

It's on the standard example of the API: https://js.tensorflow.org/api/latest/#isNaN

Other info / logs Forum for this issue - though nobody answered as of 2023.01.13 - is here: https://discuss.tensorflow.org/t/tf-isnan-misbehaves-in-browsers/14059

shmishra99 commented 1 year ago

Hi @harangp ,

Thank you for providing us with the code and output screenshots. We appreciate your help in determining the issue.
As i check in latest version(v4.4.0) as well as older version(4.2.0 or below) of tensorflow.js. It is giving the expected output. Kindly find the attached screenshots.

image

Output:

image

Document

image

Let me know if it helps.

harangp commented 1 year ago

Hi @shmishra99

I double-checked your example, because it bugs me to no end. I re-tested with my computers, and have the same bad results: Screenshot 2023-04-27 205953 I could trace it back to having the webgl engine being suspicious. When using cpu backend, everything is fine. When changing to webgl backend (and the api description page uses it as a default), the error is present. After that, I've found the same exact problem here: https://github.com/tensorflow/tfjs/issues/5800 which matches my case as well - Intel HD Graphics 620, Shader version: 5.1, OpenGL version: 4.6, OpenCL version: 3.0, Vulkan version: 1.2.162

The problem is, that even though the bug was marked fixed and merged, it seems it doesn't work. There's a property called: WEBGL2_ISNAN_CUSTOM = true which seemingly alters the code executed for isNaN in webGL cases (found it here: https://github.com/tensorflow/tfjs/blob/master/tfjs-backend-webgl/src/glsl_version.ts ). Could you confirm, that this is the case here? (I couldn't get it right with that setting neither) Thanks.

gaikwadrahul8 commented 1 year ago

Hi, @harangp

Apologize for the delayed response and I tried to replicate the same issue from my end with latest version @tensorflow/tfjs@4.10.0 and I'm getting below output, it seems like working as expected so could you please try it from your end with latest version @tensorflow/tfjs@4.10.0

I'm using Google chrome browser with Version 116.0.5845.96 on my M1 Macsystem.

For your reference I have added output screenshot below :

image

github-actions[bot] commented 1 year ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

harangp commented 1 year ago

Hi, @gaikwadrahul8, The latest version of tfjs@4.10.0 seemingly works - but only, because it has webgpu enabled by default. the webgl shows the same issue. Please note, you won't be able to reproduce this error on mac machines, as it seemingly affects machines with Intel HD graphics. I'll generate a new set of test as you can see bellow:

Screenshot 2023-09-01 082839

I confirmed these results on two other machines, using Intel HD graphics, same everywhere. I really don't know, wether it's an error in the Intel graphics drivers, or in the webgl code of tfjs. For me, as long as webgpu engine is working fine, I'll be using that one.

alvinleung1996 commented 1 year ago

I have tested it on my windows desktop and can reproduce the bug with version 4.11.0.

image

My computer specs:

I have tried to produce NaN using different methods and isNaN behaves inconsistently on the webgl backend: image image