manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
164 stars 50 forks source link

Loop-blocking for large number of particles #115

Closed manodeep closed 6 years ago

manodeep commented 7 years ago

For correlation functions with a large number of particles, as is the case for a lot of DR computations, the default should be a loop-blocking structure + calls to the appropriate kernel. Now one issue with a loop-blocking implementation is to catch the case where no further computations are necessary.

for(int i=0;i<N1;i++) {
    for(int j=0;j<N2;j++) {
        int status = countpairs..._kernel(...)
        if(status == COMPUTE_DONE) break;
    }
}

where COMPUTE_DONE is a new enum in defs.h.

Should be fixed alongside #114

manodeep commented 6 years ago

While this seems like a good option to implement, all of my attempts have not produced any performance improvements for DD codes even thought the lines-of-code are much larger. Might be worth investigating in the future, or on a case-by-case basis. Closing for now.