Open jetcat8848 opened 11 months ago
i tried mod a OpenCL benchmark(disable fma to prevent GPU use it,the code like this:
diff --git a/src/lbm.cpp b/src/lbm.cpp index d99202f..28aeb25 100644 --- a/src/lbm.cpp +++ b/src/lbm.cpp @@ -286,6 +286,8 @@ void LBM_Domain::enqueue_unvoxelize_mesh_on_device(const Mesh* mesh, const uchar }
string LBM_Domain::device_defines() const { return
OK,the moded OpenCL benchmark runs,and the 170hx fp32 flops increased to 6.285 Tflops,the original fp32 flops just only 0.395Tflops,6.285/0.395=16,so,i think the nvidia driver prevented gpu use full speed on FMA!