jaidonlybbert / MixedPrecisionBlockQR

CUDA implementation of mixed-precision block QR decomposition
MIT License
1 stars 4 forks source link

Device WY optimizations, and working mixed-precision BQR #5

Closed jaidonlybbert closed 1 year ago

jaidonlybbert commented 1 year ago

Nasty merge attempt, reject all changes in main branch out of sync with optimize_wy. The current commit in optimize_wy is most up to date.