Closed vladmandic closed 3 years ago
@vladmandic I think this might be related to the input data, can you verify the input are the same for node and browser? I suspect fromPixels and decodeJPeg might produce different pixel values.
@pyu10055 that's the first thing i've thought of as well :)
and yes, decodeJpeg
and fromPixels
do produce different results - specifically, RGB values in fromPixels
are offset by +1
i've also double-checked behavior of alignCorners
and similar items when performing resizeBilinear
but i've handled that and that's why i'm printing the checksum of the input (after normalization) now - to confirm input is 100% identical
(if there were any differences, I'd have implemented something like canvas.js
decoding which is uniform on both platform)
@vladmandic the WebGL has precision loss when stored on texture, it is usually rather small. The input sum is negative seems to be weird, is it overflowing already?
@pyu10055
the WebGL has precision loss when stored on texture, it is usually rather small
The thing is WebGL
and WASM
produce results identical up to 5th decimal point (after that it's up to WebGL precision loss)
But tfjs-node
produces results which are ~2-5% different than either WebGL
or WASM
which is not small
The input sum is negative seems to be weird, is it overflowing already?
Input here is just an image resized to 1x512x512x3 and normalized
Sum is just a (very) cheap way to do a hash to make sure inputs are same, but given the size of the array no wonder its overflowing.
But...
I've just tried with TFJS 3.5.0 where tfjs-node
ships with TF2 and difference is almost gone
(sum of conv2d values now shows divergence of ~0.15% - that is at least 25x improvement)
(and more importantly, model predictions actually match)
So I guess bug was in TF1 implementation of conv2d
- and finally updating TFJS to use TF2 resolved this issue as well
Feel free to close the issue
that is great to know, thanks!
i've been chasing down why same image model (object detection) has slightly different results in browser and node environments and it comes down to results of
tf.conv2d
being slightly different for the exactly same inputs.in browser,
cpu
,webgl
andwasm
backends produce identical results (and WEBGL_CONV_IM2COL has no effect). buttfjs-node
usingtensorflow
backend produces different result.example code:
browser output:
node output:
you can see that value of just first entry is already different and that a simple checksum is off by ~1%
environment: tfjs 3.3.0 on chrome 89 and ubuntu 20.10