Open iqddd opened 2 months ago
It doesn't seem that the issue is related to switching to float32 or disabling mixed precision. The problem is likely due to the algorithm itself or its unoptimized implementation. BOFT training is approximately 1.8 times slower than Diag-OFT and more than twice as slow as regular LoRa. Can we expect any improvements on this?
Based on the training speed, sample generation speed, and final file size compared to Diag-OFT, it appears that BOFT-training has switched to float32. Is there a possibility to implement mixed precision?