waylonflinn / weblas

GPU Powered BLAS for Browsers :gem:
MIT License
702 stars 43 forks source link

sgemm inaccuracies (some elements scaled by 4) in VMware Workstation 12 Pro (Ubuntu 18.04/Windows 10) #47

Open MattRibeiro opened 6 years ago

MattRibeiro commented 6 years ago

I wanted to report what may be some incompatibility issue specific to VMware. The same code operates fine on physical machines using the same operating systems, but exhibits the same calculation failures across multiple operating systems using the same VMWare Workstation Pro environment. I'm not expecting resolution to this issue, but would be happy to help debug it if there was anything you thought I could try.

The issue occurs regularly when using sgemm with the net result that certain elements in the resultant matrix are scaled by 4. Here is some code that exhibits the behavior:

var M = 5;
var N = 5;
var K = 5;
var alpha = 1.0;
var beta = 0.0;

var A = Float32Array.from({length: M * K}, (v,i) => i * 1); // [ 0, 1, 2, ... M*K ]
var B = Float32Array.from({length: K * N}, (v,i) => i % (K + 1) ? 0 : 1); // identiy
var C = null;

result = weblas.sgemm(M, N, K, alpha, A, B, beta, C);
var asInt = Float32Array.from({length: M * N}, (v,i) => Math.round(result[i]));

Here are the contents of A, B, and result (aka asInt):

//            V         V                                             V
// asInt: [0, 4, 2, 3, 16, 20, 24, 28, 8, 9, 10, 11, 12, 13, 14, 15, 64, 68, 72, 76, 80, 84, 88, 92, 96]
//     A: [0, 1, 2, 3,  4,  5,  6,  7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
//     B: [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1]

I have tried enabling and disabling GPU acceleration for the VM through VMware's configuration menu but get the same results. I'm using VMware Workstation Pro version 12.5.9 build-7535481. I see there is a new release (version 14), and will try to upgrade when I get the opportunity. result and asInt contain basically the same results, but result is just harder to read as floating point.

waylonflinn commented 6 years ago

Very interesting!

Thanks for reporting this issue. I have a hunch about the source, but may need some time to come up with a proper test case. :thinking:

Do keep me updated on further results.