huppertt / nirs-toolbox

Toolbox for fNIRS analysis
86 stars 61 forks source link

Add single precision support for CPU (1.5x speed increase), GPU (16x speed increase), Fix improper missing transpose in GPU ar_irls & glm + lower memory requirements for GPU #22

Closed AdrianCurtin closed 1 year ago

AdrianCurtin commented 1 year ago

Also fix two variable renaming behaviors in nirs_viewer

AdrianCurtin commented 1 year ago

@huppertt Please merge and test this PR when you can. Transpose was missing in the GPU version and this corrects that error. Additionally single precision support for DFE calculations shows substantial performance improvements, my testing only shows slight variations in the 5th significant digit between CPU/GPU and single/double precision calculations, but single precision on GPU is ~45x faster than on CPU with double precision

On a test dataset: CPU (M1 Max) - double precision ~97.7s per channel (dfe result 2.1183e4) CPU (M1 Max) - single precision ~54.3s per channel (dfe result 2.1182e4) GPU (RTX3080) - double precision ~31.2s per channel (dfe result 2.1183e4) GPU (RTX3080) - single precision ~1.8s per channel (dfe result 2.1184e4)