waylonflinn / weblas

GPU Powered BLAS for Browsers :gem:
MIT License
702 stars 43 forks source link

Use GLSL `dot` in Row by Column Multiply #13

Closed waylonflinn closed 8 years ago

waylonflinn commented 8 years ago

This PR merges the performance improvements for low-end graphics cards.

These improvements center around the use of the dot GLSL function in the fragment shader. Using this function allows the processing of four (4) elements simultaneously. This method is similar to, though more effective than, loop unrolling, as the dot function can take advantage of special hardware optimizations (like Fused Multiply Add ).

In order to use this method special data packing was employed. This includes loading the transpose of B (the right hand matrix in the multiply) into the input texture. This is necessary because elements to be included in a row by column dot product must be contiguously packed in an RGBA texture. Since the access methods use a row-major ordering the transpose of the second matrix must be used. A second change to the data packing is the use of separate textures for each input matrix. In addition to facilitating better computational methods, it also results in more efficient space utilization than the previous method. This should yield performance gains (from reduction in memory transfers), even on high performance hardware.

waylonflinn commented 8 years ago

Tests have passed on all test hardware configurations. Benchmarks are also improved in all configurations.