Closed lgarrison closed 4 years ago
@lgarrison We should leave instr_detect()
as purely a runtime feature detector and perhaps add a separate function that finds the highest isa with both runtime and compile-time support. That should solve the problem of offloading to the fallback
kernel instead of the next highest isa.
We should also warn the user if the compile time support is lower than runtime support if we are not working around the GAS bug.
Okay, how about I rename instr_detect()
to instr_detect_runtime()
and make a new instr_detect()
that checks both runtime and compile-time support?
@lgarrison That sounds good. Would it make sense to rename this new instr_detect
to get_max_usable_isa()
?
@lgarrison Once this is merged in, are you fine with releasing v2.3.2?
Yes! Will push the changes in a few minutes.
On Fri, Oct 11, 2019 at 12:47 PM Manodeep Sinha notifications@github.com wrote:
@lgarrison https://github.com/lgarrison Once this is merged in, are you fine with releasing v2.3.2?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/pull/200?email_source=notifications&email_token=ABLA7S234SAYSWSVHWCQO7LQOCUZ7A5CNFSM4I6GED52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBASGVI#issuecomment-541139797, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLA7S7J55HAKXWO7RJHZX3QOCUZ7ANCNFSM4I6GED5Q .
-- Lehman Garrison lgarrison@flatironinstitute.org Flatiron Research Fellow, Cosmology X Data Science Group Center for Computational Astrophysics, Flatiron Institute lgarrison.github.io
@lgarrison Do you want to add in fallback_offset = highest_isa;
as well -- that way the code will default to the highest supported ISA rather than the fallback kernel? That would also mean replacing the last default
option with num_functions - 1
(rather than fallback_option
).
Yes, that's a good idea. But the fallback_offset
index isn't just the isa number, unfortunately. We need to know the index in the function array. I'll have to think about whether there's a clean way to do that.
The highest compiled kernel will always be at index 0. However, since the compiled kernels might not correspond to the runtime available isa, the required index will be different. Would something like this work:
int highest_available_isa = fallback_offset;
int num_available_isa = 0;
#ifdef __SSE4_2__
if (iset >= 6) {
highest_available_isa = sse_offset;
num_available_isa++;
}
#endif
#ifdef __AVX__
if (iset >=7) {
highest_available_isa = avx_offset;
num_available_isa++;
}
#endif
#ifdef __AVX2__
if (iset >=8) {
highest_available_isa = avx2_offset;
num_available_isa++;
}
#endif
#ifdef __AVX512F__
if (iset >=9) {
highest_available_isa = avx512_offset;
num_available_isa++;
}
#endif
if(num_available_isa != num_functions) {
fprintf(stderr,"Warning: Number of kernels with both runtime and compile time support = %d while the number of compiled kernels =%d\n", num_available_isa, num_functions);
}
Completely untested, uncompiled code but you get the idea.
Just pushed another version, does it look okay? I will port it to the other CFs if so. I tested it, and it seems to work.
@lgarrison Pinging. We should be able to release v2.3.2 with this and the few small changes in the docs.
Agreed! The porting to the other CFs will take an hour or so, I'll try to finish it on Thursday or Friday.
@manodeep All done! Can you look over the diff one more time? If Travis passes, I think we're ready to merge & release.
@lgarrison I left a couple of minor comments.
Also, will you please confirm that get_max_usable_isa
returns the fastest available function (or prints appropriate warning) for the following 4 combinations between (compiler_unsupported_isa, compiler_supported_isa)
and (runtime_unsupported_isa, runtime_supported_isa)
?
// cpu_features_test.c
#include <stdio.h>
#include "utils/cpu_features.c"
int main(void){
printf("get_max_usable_isa(): %d\n", get_max_usable_isa());
return 0;
}
Compiler supported, runtime supported:
$ gcc -mavx2 -std=c99 cpu_features_test.c -o cpu_features_test -I$HOME/corrfunc
$ ./cpu_features_test
get_max_usable_isa(): 8
Compiler supported, runtime unsupported:
$ gcc -mavx512f -std=c99 cpu_features_test.c -o cpu_features_test -I$HOME/corrfunc
$ ./cpu_features_test
get_max_usable_isa(): 8
This is right because -mavx512f
implies -mavx2
and lower.
Compiler unsupported, runtime supported:
$ gcc -mavx -std=c99 cpu_features_test.c -o cpu_features_test -I$HOME/corrfunc
$ ./cpu_features_test
[Warning] The CPU supports AVX2 but the compiler does not. Can you try another compiler?
get_max_usable_isa(): 7
Look okay?
@lgarrison This is on my list for today
@lgarrison Everything looks good. I have one suggestion - we should move the instruction set enums into cpu_features.h
and include cpu_features.h
within defs.h
. That way cpu_features.h
does not need to include defs.h
.
We should probably also rename defs.h
to corrfunc.h
:) (not as part of this PR, perhaps for v3.0)
Done!
And I agree with the renaming suggestion for v3.0.
@lgarrison Should I go ahead and "squash-merge"? Was there anything else to update on this PR?
Yes, I think it is ready to squash & merge!
If the CPU supports AVX-512 but Corrfunc was only compiled with AVX support (for example), we end up using the fallback kernel by default. The correct behavior is to use the highest instruction set that has both compile-time and runtime support.
In this PR, I implemented that behavior by modifying
instrset_detect()
. That fixes the problem globally. But if we really wantinstrset_detect()
to remain a pure runtime detector, we could fix the dispatch functions in each of the kernels, or wrapinstrset_detect()
in another function.Should we warn the user about having runtime support but not compile-time support? That's a strange situation that's usually caused by using an old compiler. But with the binutils bug, we have a legitimate situation in which we do not compile against the maximum runtime instruction set.