Closed huitseeker closed 1 year ago
The current behavior hardcodes the compilation at the Turing architecture, which is suboptimal behavior when the compilation target is anything else.
The compiler leaves ptx IR code in the executable, and if you execute it on a later architecture, the ptx gets JIT-compiled and you'll get adequate performance. This is because the ptx code is the same for all architectures. Well, this, code being the same, will change (or is being changed) in upcoming sppark release, but there is room for only one ptx:-( What we're doing is generating binary code for 7.x and 8.x, and leave ptx for 8.x. So that when the application gets executed on 9.x, newer ptx will be JIT-ed. The corresponding flags will be added to pasta-msm.
I see. Let me figure out if I can adjust this PR somewhat:
NVIDIA_PREPEND_FLAGS
be nonetheless useful for e.g. passing --verbose
?-arch=native
by default, after having returned an error if :
nvidia-smi --query-gpu=compute_cap --format=csv,noheader
(or some equivalent detection, feel free to suggest!) returns something < 7?The thing about -arch=native
is that it doesn't leave ptx [behind], so that the resulting executable can't be launched on any other platform. This goes against the intention to produce "run-everywhere" binaries. But why the questions? In either case, option 3 was the conscious choice from the get-go. The fact is that with that much inline assembler there is not much left to do for the compiler...
As for --verbose
. What is it you want to see in addition to cargo ... -vvv
? I'd argue that if you're in a position that you actually need to reach out for that kind of information, then you can just as well pull down the repo and temporarily modify build.rs. Point is that the flag in question is for so low-level trouble-shooting that only one or two users would actually need it.
Just in case, as for option 3. If you want to add a comment, then use the opportunity to copy flags from sppark/poc/msm-cuda. It's no harm in doing it in advance:-)
Alright, it seems better to wait for the release of sppark/poc/msm-cuda
, since it seems this would require a larger scope of updates anyway. Thanks for the explanations.
The current behavior hardcodes the compilation at the Turing architecture, which is suboptimal behavior when the compilation target is anything else.
This PR probes the
NVCC_{PREPEND,APPEND}_FLAGS
environment variables and passes their contents to the compiler, only overriding with the legacy flags if those are not present in those environment variables.