Closed CFAndy closed 9 years ago
The libm based powf cost a lot on TegraK1(when cublas is used), replace it with a neon based implementation from math_neon. Now the processing time decrease to 600ms on Tegra 1.7Ghz with GPU@400Mhz and EMC at DDR3-1600.
Fix a bug in Buffer.transpose. The free(_data) is fixed.
The libm based powf cost a lot on TegraK1(when cublas is used), replace it with a neon based implementation from math_neon. Now the processing time decrease to 600ms on Tegra 1.7Ghz with GPU@400Mhz and EMC at DDR3-1600.