opencv / opencv

Open Source Computer Vision Library
https://opencv.org
Apache License 2.0
75.69k stars 55.61k forks source link

core: fix `Core_GEMM.accuracy` failure on recent macOS #25454

Closed fengyuentau closed 1 week ago

fengyuentau commented 1 week ago

Resolves https://github.com/opencv/opencv/issues/25302

Reproducer: https://github.com/opencv/ci-gha-workflow/actions/runs/8747714722/job/24006610667?pr=171#step:12:1041

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

fengyuentau commented 1 week ago

It is even weirder that if I set unrolling by factor of 2 instead of 4, it passes. Could it be something related to the compiler?

                 #if CV_ENABLE_UNROLLED
                for(; j <= m - 2; j += 2 )
                {
                    WT t0 = d_buf[j] + WT(b_data[j])*al;
                    WT t1 = d_buf[j+1] + WT(b_data[j+1])*al;
                    d_buf[j] = t0;
                    d_buf[j+1] = t1;
                    // t0 = d_buf[j+2] + WT(b_data[j+2])*al;
                    // t1 = d_buf[j+3] + WT(b_data[j+3])*al;
                    // d_buf[j+2] = t0;
                    // d_buf[j+3] = t1;
                }
                #endif
opencv-alalek commented 1 week ago

Are there some aliased pointers on the same memory for reading and writing? (inplace processing)

fengyuentau commented 1 week ago

Are there some aliased pointers on the same memory for reading and writing? (inplace processing)

Probably no.

https://github.com/opencv/opencv/blob/5da17a4b03b9b9cac92ccadd5fda249523dfdd79/modules/core/src/matmul.simd.hpp#L1246-L1257

fengyuentau commented 1 week ago

Test is now green with this patch merged https://github.com/opencv/ci-gha-workflow/actions/runs/8750864630/job/24015266117?pr=171