Closed bekorn closed 1 week ago
Perhaps the CL_DEVICE_VENDOR string can be added as a safety net? Unless that is reported for all models.. Sadly I don't have a lot of devices to test on, so I'm grateful for your PR!
Before fixing this, ISAMD was not defined on traverse.cl > SIMD_AABBTEST was not defined > AABB test used _native_fma > OCL compiler gave the error: "use of undeclared identifier '_native_fma'" > failed speedtest. _native_fma not being recognised seemed like something that shouldn't happen however, I have no experience with OCL unfortunately, just reporting
So with your patch properly compiles on AMD? Question: could you try out which combination of 'SIMD_AABBTEST' and 'USE_VLOAD_VSTORE' is best for AMD? On Intel, performance is best when I don't use the vload/vstore commands but do use the 'SIMD' aabb test code; on NVIDIA it's the other way round. For AMD I am just guessing.
If I don't use SIMD_AABBTEST, it doesn't compile because error: "use of undeclared identifier '_native_fma'"
. Full log:
C:\Users\Berk\AppData\Local\Temp\comgr-56503e\input\CompileSource:385:22: error: use of undeclared identifier '_native_fma'
385 | float tminx0 = _native_fma( lox4.x, idirx, origx ), tminx1 = _native_fma( lox4.y, idirx, origx );
| ^
C:\Users\Berk\AppData\Local\Temp\comgr-56503e\input\CompileSource:385:68: error: use of undeclared identifier '_native_fma'
385 | float tminx0 = _native_fma( lox4.x, idirx, origx ), tminx1 = _native_fma( lox4.y, idirx, origx );
| ^
C:\Users\Berk\AppData\Local\Temp\comgr-56503e\input\CompileSource:386:22: error: use of undeclared identifier '_native_fma'
386 | float tminy0 = _native_fma( loy4.x, idiry, origy ), tminy1 = _native_fma( loy4.y, idiry, origy );
| ^
C:\Users\Berk\AppData\Local\Temp\comgr-56503e\input\CompileSource:386:68: error: use of undeclared identifier '_native_fma'
386 | float tminy0 = _native_fma( loy4.x, idiry, origy ), tminy1 = _native_fma( loy4.y, idiry, origy );
| ^
C:\Users\Berk\AppData\Local\Temp\comgr-56503e\input\CompileSource:387:22: error: use of undeclared identifier '_native_fma'
387 | float tminz0 = _native_fma( loz4.x, idirz, origz ), tminz1 = _native_fma( loz4.y, idirz, origz );
| ^
C:\Users\Berk\AppData\Local\Temp\comgr-56503e\input\CompileSource:387:68: error: use of undeclared identifier '_native_fma'
387 | float tminz0 = _nativ
With USE_VLOAD_VSTORE:
tiny_bvh version 0.9.5 performance statistics (MSVC 1941 build)
running on AMD Ryzen 5 7600 6-Core Processor
----------------------------------------------------------------
available OpenCL platforms:
#0: AMD Accelerated Parallel Processing
OpenCL device: AMD Accelerated Parallel Processing
Device # 0, (name: gfx1031) (platform: OpenCL 2.0 AMD-APP (3617.0)) (vendor: Advanced Micro Devices, Inc.)
Local memory size: 64KB
hardware detected: AMD.
----------------------------------------------------------------
Loading triangle data (262267 tris).
BVH construction speed
warming caches...
BVH traversal speed
- GPU, coherent, alt 2-way layout, ocl: 6.7ms for 7.68M rays => 1139.37MRay/s
- GPU, coherent, alt 4-way layout, ocl: 6.2ms for 7.68M rays => 1231.07MRay/s
- GPU, coherent, CWBVH layout, ocl: 7.3ms for 7.68M rays => 1047.86MRay/s
all done.
Without USE_VLOAD_VSTORE:
- GPU, coherent, alt 2-way layout, ocl: 6.8ms for 7.68M rays => 1133.41MRay/s
- GPU, coherent, alt 4-way layout, ocl: 6.2ms for 7.68M rays => 1231.23MRay/s
- GPU, coherent, CWBVH layout, ocl: 7.8ms for 7.68M rays => 979.40MRay/s
With your new commits:
#define USE_VLOAD_VSTORE
//#define SIMD_AABBTEST
- GPU, coherent, CWBVH layout, ocl: 7.3ms for 7.68M rays => 1047.28MRay/s
//#define USE_VLOAD_VSTORE
//#define SIMD_AABBTEST
- GPU, coherent, CWBVH layout, ocl: 7.9ms for 7.68M rays => 976.71MRay/s
Thank you for testing that. So the optimal setting is vload on, simd off for AMD, yielding about 1Gray/s.
On Wed, Nov 20, 2024 at 6:01 PM bekorn @.***> wrote:
With your new commits:
define USE_VLOAD_VSTORE
//#define SIMD_AABBTEST
- GPU, coherent, CWBVH layout, ocl: 7.3ms for 7.68M rays => 1047.28MRay/s
//#define USE_VLOAD_VSTORE //#define SIMD_AABBTEST
- GPU, coherent, CWBVH layout, ocl: 7.9ms for 7.68M rays => 976.71MRay/s
— Reply to this email directly, view it on GitHub https://github.com/jbikker/tinybvh/pull/25#issuecomment-2489120743, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFH3PEHM5XZYW73PIGSGCU32BS55BAVCNFSM6AAAAABSEWRMVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZGEZDANZUGM . You are receiving this because you modified the open/close state.Message ID: @.***>
I thnik there is a typo here
inline float _native_fma( const float a, const float b, const float c )
{
#ifdef _ISNVIDIA <= should be ISNVIDIA
Yes, seems like it. Radeon 6700 XT is equivalent to 4060 Ti, fyi
Thank you for testing that. So the optimal setting is vload on, simd off for AMD, yielding about 1Gray/s. … On Wed, Nov 20, 2024 at 6:01 PM bekorn @.> wrote: With your new commits: #define USE_VLOAD_VSTORE //#define SIMD_AABBTEST - GPU, coherent, CWBVH layout, ocl: 7.3ms for 7.68M rays => 1047.28MRay/s //#define USE_VLOAD_VSTORE //#define SIMD_AABBTEST - GPU, coherent, CWBVH layout, ocl: 7.9ms for 7.68M rays => 976.71MRay/s — Reply to this email directly, view it on GitHub <#25 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFH3PEHM5XZYW73PIGSGCU32BS55BAVCNFSM6AAAAABSEWRMVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZGEZDANZUGM . You are receiving this because you modified the open/close state.Message ID: @.>
Indeed, thanks.
On Wed, Nov 20, 2024 at 6:04 PM bekorn @.***> wrote:
I thnik there is a typo here
inline float _native_fma( const float a, const float b, const float c ) {
ifdef _ISNVIDIA <= should be ISNVIDIA
— Reply to this email directly, view it on GitHub https://github.com/jbikker/tinybvh/pull/25#issuecomment-2489128774, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFH3PEBNSXMK7C6JL6CGSPT2BS6K5AVCNFSM6AAAAABSEWRMVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZGEZDQNZXGQ . You are receiving this because you modified the open/close state.Message ID: @.***>
AMD Radeon 6700 XT returns "gfx1031" for CL_DEVICE_NAME.
I also tried CL_DEVICE_VENDOR and it returned "Advanced Micro Devices, Inc.", maybe this could be easier to check.