tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.5k stars 1.93k forks source link

The output of the operator tf.cos on backend WebGL is inconsistent with the backend CPU and Wasm #8389

Open liliquan0118 opened 1 month ago

liliquan0118 commented 1 month ago

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Describe the current behavior Given the specific input, the output of the operator tf.cos is inconsistent between the backend WebGL and CPU、Wasm. The input is: [[222031363.04483604,-644006247.6421678,-605587169.2042153],[1454271400.9804254,62586397.87448788,-68460449.96912003],[947971713.1915293,-1557450302.7682097,-177600034.40499234],[580296681.4137621,-1029945044.7648776,1242489427.454761],[765821514.0764089,-1050250673.9083493,-807593657.384702],[-899719029.7993956,1994507237.8503175,643740213.4526525],[1739034989.3838263,854594021.672925,2080780625.6686144],[-1487391531.946444,-649097467.5578747,915379088.7619557],[-699072652.5136976,580871812.2644863,-2054123548.3292303],[-1695726211.436941,-1668680183.31515,-1232482205.5715837],[-271449972.2313881,-1794459494.529479,485735372.1116624],[-1498322592.7383604,81534077.20091677,-171731024.55517578],[-1734544940.168774,-1537308013.2696624,-813783320.1463277],[1877502216.6916676,738706968.7915564,-151212949.78009176],[-497439374.4975374,-1492868504.0418532,-1147644161.922762],[-381496084.0865674,-1273512753.80469,-1342920474.1095667],[1305493931.7932196,1111735360.2880988,-758057797.123117],[-1530258070.5911965,-460829762.73904085,925297487.5536752],[1011907807.5187173,-947508069.6779656,1415089246.4553628]]

The output on the WebGL backend is:

Tensor
    [[0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0],
     [0, 0, 0]]

The output on the CPU backend is:

Tensor
    [[-0.9764838, -0.9021782, -0.6038327],
     [-0.9670443, -0.2711793, -0.2760595],
     [0.7119179 , -0.400943 , 0.996177  ],
     [0.8146462 , -0.4115099, 0.2581772 ],
     [0.761098  , -0.9927804, -0.6599192],
     [-0.7709348, 0.8152787 , -0.2078054],
     [-0.1062753, 0.8878115 , -0.9271529],
     [-0.6936102, 0.204306  , 0.7637746 ],
     [0.0859669 , -0.9586536, 0.9030097 ],
     [0.6675315 , 0.9997346 , -0.8668975],
     [0.0365342 , 0.2261674 , -0.8807448],
     [0.1259749 , 0.6492093 , -0.5970631],
     [0.2290593 , -0.5758974, 0.5143586 ],
     [0.9070938 , -0.6664433, -0.9998314],
     [0.9792625 , 0.9905342 , 0.9872381 ],
     [-0.9484215, 0.9099125 , -0.9907898],
     [-0.9538459, 0.6375976 , -0.0243272],
     [-0.9596693, 0.1387126 , -0.9999884],
     [-0.9997545, 0.8163752 , 0.4726494 ]]

The output on the WASM backend is:

Tensor
    [[-0.9764838, -0.9021782, -0.6038326],
     [-0.9670443, -0.2711794, -0.2760595],
     [0.7119178 , -0.4009432, 0.996177  ],
     [0.8146462 , -0.4115099, 0.2581772 ],
     [0.7610979 , -0.9927804, -0.6599191],
     [-0.7709348, 0.8152786 , -0.2078057],
     [-0.1062753, 0.8878114 , -0.9271529],
     [-0.6936101, 0.204306  , 0.7637745 ],
     [0.0859666 , -0.9586536, 0.9030097 ],
     [0.6675314 , 0.9997346 , -0.8668973],
     [0.0365342 , 0.2261675 , -0.8807447],
     [0.1259749 , 0.6492093 , -0.5970629],
     [0.2290591 , -0.5758972, 0.5143584 ],
     [0.9070938 , -0.6664432, -0.9998314],
     [0.9792625 , 0.9905342 , 0.9872381 ],
     [-0.9484215, 0.9099126 , -0.9907898],
     [-0.9538459, 0.6375976 , -0.0243271],
     [-0.9596693, 0.1387123 , -0.9999884],
     [-0.9997545, 0.8163751 , 0.4726494 ]]

Describe the expected behavior The consistent behavior in all backends.

Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/CodePen/any notebook.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bug00</title>
</head>
<body>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.js"> </script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-wasm/dist/tf-backend-wasm.js"></script>
<script>
    function cos(backend){
        tf.setBackend(backend);
        tf.ready().then(()=>{var input=[[222031363.04483604,-644006247.6421678,-605587169.2042153],[1454271400.9804254,62586397.87448788,-68460449.96912003],[947971713.1915293,-1557450302.7682097,-177600034.40499234],[580296681.4137621,-1029945044.7648776,1242489427.454761],[765821514.0764089,-1050250673.9083493,-807593657.384702],[-899719029.7993956,1994507237.8503175,643740213.4526525],[1739034989.3838263,854594021.672925,2080780625.6686144],[-1487391531.946444,-649097467.5578747,915379088.7619557],[-699072652.5136976,580871812.2644863,-2054123548.3292303],[-1695726211.436941,-1668680183.31515,-1232482205.5715837],[-271449972.2313881,-1794459494.529479,485735372.1116624],[-1498322592.7383604,81534077.20091677,-171731024.55517578],[-1734544940.168774,-1537308013.2696624,-813783320.1463277],[1877502216.6916676,738706968.7915564,-151212949.78009176],[-497439374.4975374,-1492868504.0418532,-1147644161.922762],[-381496084.0865674,-1273512753.80469,-1342920474.1095667],[1305493931.7932196,1111735360.2880988,-758057797.123117],[-1530258070.5911965,-460829762.73904085,925297487.5536752],[1011907807.5187173,-947508069.6779656,1415089246.4553628]]
            const result = tf.cos(input);
            console.log("the result of ", backend, "is:\n" );
            result.print();
        })
    }
    async function test() {
        await cos("webgl");
        await cos("cpu");
        await cos("wasm");
    }

    test();
</script>
</body>
</html>

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

shmishra99 commented 1 month ago

Hi @liliquan0118 ,

I've executed the code snippet you provided, and I'm receiving the expected same output across all backends (CPU, WASM, WebGL). I'm running the snippet on macOS 14.7 and my chrome version is 129.0.66

Ouptput:

image

Thank You!!

liliquan0118 commented 1 month ago

@shmishra99 I get inconsistent results on macOS 13.3 (22E252) and my chrome version is Version 128.0.6613.86 (Official Build) (arm64). I also reproduced this inconsistency in Firefox and safari.

Output: image