m4rs-mt / ILGPU

ILGPU JIT Compiler for high-performance .Net GPU programs
http://www.ilgpu.net
Other
1.35k stars 116 forks source link

Optimized PTX IntrinsicMath implementation to use LibDevice. #1189

Closed MoFtZ closed 5 months ago

MoFtZ commented 5 months ago

Extracted from #1148.

Depends on #1187 and #1185.

PTXMath in ILGPU.Algorithms provides a number of math functions using Cordic implemenentations. However, these are considerably slower compared to LibDevice.

Currently, LibDevice methods need to be explicitly called.

This PR automatically tries to initialize LibDevice when the Context is being configured. If found, all the Math intrinsics are redirected to using LibDevice. The Cordic implementation in ILGPU.Algorithms will only activate itself if LibDevice is not available.