Closed AdrianCurtin closed 1 year ago
@huppertt Please merge and test this PR when you can. Transpose was missing in the GPU version and this corrects that error. Additionally single precision support for DFE calculations shows substantial performance improvements, my testing only shows slight variations in the 5th significant digit between CPU/GPU and single/double precision calculations, but single precision on GPU is ~45x faster than on CPU with double precision
On a test dataset: CPU (M1 Max) - double precision ~97.7s per channel (dfe result 2.1183e4) CPU (M1 Max) - single precision ~54.3s per channel (dfe result 2.1182e4) GPU (RTX3080) - double precision ~31.2s per channel (dfe result 2.1183e4) GPU (RTX3080) - single precision ~1.8s per channel (dfe result 2.1184e4)
Also fix two variable renaming behaviors in nirs_viewer