illario7 / P106toPlay

15 stars 2 forks source link

170hx(any cmp hx card)can run higher and higher fp32 flops than before #2

Open jetcat8848 opened 11 months ago

jetcat8848 commented 11 months ago

1BBDAC25-B3B1-47FB-ADDD-10C2965FB67C 1A08AC2F-96FA-4FA4-97B2-6B0A9FABA841

jetcat8848 commented 11 months ago

i tried mod a OpenCL benchmark(disable fma to prevent GPU use it,the code like this:

diff --git a/src/lbm.cpp b/src/lbm.cpp index d99202f..28aeb25 100644 --- a/src/lbm.cpp +++ b/src/lbm.cpp @@ -286,6 +286,8 @@ void LBM_Domain::enqueue_unvoxelize_mesh_on_device(const Mesh* mesh, const uchar }

string LBM_Domain::device_defines() const { return

OK,the moded OpenCL benchmark runs,and the 170hx fp32 flops increased to 6.285 Tflops,the original fp32 flops just only 0.395Tflops,6.285/0.395=16,so,i think the nvidia driver prevented gpu use full speed on FMA!