manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
163 stars 50 forks source link

Only report runtime isa support if we also have compiler support #200

Closed lgarrison closed 4 years ago

lgarrison commented 4 years ago

If the CPU supports AVX-512 but Corrfunc was only compiled with AVX support (for example), we end up using the fallback kernel by default. The correct behavior is to use the highest instruction set that has both compile-time and runtime support.

In this PR, I implemented that behavior by modifying instrset_detect(). That fixes the problem globally. But if we really want instrset_detect() to remain a pure runtime detector, we could fix the dispatch functions in each of the kernels, or wrap instrset_detect() in another function.

Should we warn the user about having runtime support but not compile-time support? That's a strange situation that's usually caused by using an old compiler. But with the binutils bug, we have a legitimate situation in which we do not compile against the maximum runtime instruction set.

manodeep commented 4 years ago

@lgarrison We should leave instr_detect() as purely a runtime feature detector and perhaps add a separate function that finds the highest isa with both runtime and compile-time support. That should solve the problem of offloading to the fallback kernel instead of the next highest isa.

We should also warn the user if the compile time support is lower than runtime support if we are not working around the GAS bug.

lgarrison commented 4 years ago

Okay, how about I rename instr_detect() to instr_detect_runtime() and make a new instr_detect() that checks both runtime and compile-time support?

manodeep commented 4 years ago

@lgarrison That sounds good. Would it make sense to rename this new instr_detect to get_max_usable_isa()?

manodeep commented 4 years ago

@lgarrison Once this is merged in, are you fine with releasing v2.3.2?

lgarrison commented 4 years ago

Yes! Will push the changes in a few minutes.

On Fri, Oct 11, 2019 at 12:47 PM Manodeep Sinha notifications@github.com wrote:

@lgarrison https://github.com/lgarrison Once this is merged in, are you fine with releasing v2.3.2?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/pull/200?email_source=notifications&email_token=ABLA7S234SAYSWSVHWCQO7LQOCUZ7A5CNFSM4I6GED52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBASGVI#issuecomment-541139797, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLA7S7J55HAKXWO7RJHZX3QOCUZ7ANCNFSM4I6GED5Q .

-- Lehman Garrison lgarrison@flatironinstitute.org Flatiron Research Fellow, Cosmology X Data Science Group Center for Computational Astrophysics, Flatiron Institute lgarrison.github.io

manodeep commented 4 years ago

@lgarrison Do you want to add in fallback_offset = highest_isa; as well -- that way the code will default to the highest supported ISA rather than the fallback kernel? That would also mean replacing the last default option with num_functions - 1 (rather than fallback_option).

lgarrison commented 4 years ago

Yes, that's a good idea. But the fallback_offset index isn't just the isa number, unfortunately. We need to know the index in the function array. I'll have to think about whether there's a clean way to do that.

manodeep commented 4 years ago

The highest compiled kernel will always be at index 0. However, since the compiled kernels might not correspond to the runtime available isa, the required index will be different. Would something like this work:

int highest_available_isa = fallback_offset;
int num_available_isa = 0;
#ifdef __SSE4_2__
if (iset >= 6) {
    highest_available_isa = sse_offset;
    num_available_isa++;
}
#endif
#ifdef __AVX__
if (iset >=7) {
    highest_available_isa = avx_offset;
    num_available_isa++;
}
#endif 
#ifdef __AVX2__
if (iset >=8) {
    highest_available_isa = avx2_offset;
    num_available_isa++;
}
#endif 
#ifdef __AVX512F__
if (iset >=9) {
    highest_available_isa = avx512_offset;
    num_available_isa++;
}
#endif 

if(num_available_isa != num_functions) {
    fprintf(stderr,"Warning: Number of kernels with both runtime and compile time support = %d while the number of compiled kernels =%d\n", num_available_isa, num_functions);
}

Completely untested, uncompiled code but you get the idea.

lgarrison commented 4 years ago

Just pushed another version, does it look okay? I will port it to the other CFs if so. I tested it, and it seems to work.

manodeep commented 4 years ago

@lgarrison Pinging. We should be able to release v2.3.2 with this and the few small changes in the docs.

lgarrison commented 4 years ago

Agreed! The porting to the other CFs will take an hour or so, I'll try to finish it on Thursday or Friday.

lgarrison commented 4 years ago

@manodeep All done! Can you look over the diff one more time? If Travis passes, I think we're ready to merge & release.

manodeep commented 4 years ago

@lgarrison I left a couple of minor comments.

Also, will you please confirm that get_max_usable_isa returns the fastest available function (or prints appropriate warning) for the following 4 combinations between (compiler_unsupported_isa, compiler_supported_isa) and (runtime_unsupported_isa, runtime_supported_isa)?

lgarrison commented 4 years ago
// cpu_features_test.c
#include <stdio.h>
#include "utils/cpu_features.c"

int main(void){
    printf("get_max_usable_isa(): %d\n", get_max_usable_isa());
    return 0;
}

Compiler supported, runtime supported:

$ gcc -mavx2 -std=c99 cpu_features_test.c -o cpu_features_test -I$HOME/corrfunc
$ ./cpu_features_test 
get_max_usable_isa(): 8

Compiler supported, runtime unsupported:

$ gcc -mavx512f -std=c99 cpu_features_test.c -o cpu_features_test -I$HOME/corrfunc
$ ./cpu_features_test 
get_max_usable_isa(): 8

This is right because -mavx512f implies -mavx2 and lower.

Compiler unsupported, runtime supported:

$ gcc -mavx -std=c99 cpu_features_test.c -o cpu_features_test -I$HOME/corrfunc
$ ./cpu_features_test 
[Warning] The CPU supports AVX2 but the compiler does not.  Can you try another compiler?
get_max_usable_isa(): 7

Look okay?

manodeep commented 4 years ago

@lgarrison This is on my list for today

manodeep commented 4 years ago

@lgarrison Everything looks good. I have one suggestion - we should move the instruction set enums into cpu_features.h and include cpu_features.h within defs.h. That way cpu_features.h does not need to include defs.h.

We should probably also rename defs.h to corrfunc.h :) (not as part of this PR, perhaps for v3.0)

lgarrison commented 4 years ago

Done!

And I agree with the renaming suggestion for v3.0.

manodeep commented 4 years ago

@lgarrison Should I go ahead and "squash-merge"? Was there anything else to update on this PR?

lgarrison commented 4 years ago

Yes, I think it is ready to squash & merge!