supranational / pasta-msm

High-performance Multi-scalar Multiplication for Pasta curves
Apache License 2.0
21 stars 11 forks source link

build: Improve NVCC flags handling #7

Closed huitseeker closed 1 year ago

huitseeker commented 1 year ago

The current behavior hardcodes the compilation at the Turing architecture, which is suboptimal behavior when the compilation target is anything else.

This PR probes the NVCC_{PREPEND,APPEND}_FLAGS environment variables and passes their contents to the compiler, only overriding with the legacy flags if those are not present in those environment variables.

dot-asm commented 1 year ago

The current behavior hardcodes the compilation at the Turing architecture, which is suboptimal behavior when the compilation target is anything else.

The compiler leaves ptx IR code in the executable, and if you execute it on a later architecture, the ptx gets JIT-compiled and you'll get adequate performance. This is because the ptx code is the same for all architectures. Well, this, code being the same, will change (or is being changed) in upcoming sppark release, but there is room for only one ptx:-( What we're doing is generating binary code for 7.x and 8.x, and leave ptx for 8.x. So that when the application gets executed on 9.x, newer ptx will be JIT-ed. The corresponding flags will be added to pasta-msm.

huitseeker commented 1 year ago

I see. Let me figure out if I can adjust this PR somewhat:

  1. can NVIDIA_PREPEND_FLAGS be nonetheless useful for e.g. passing --verbose?
  2. would the following result in a less surprising behavior: to pass -arch=native by default, after having returned an error if :
    • e.g. nvidia-smi --query-gpu=compute_cap --format=csv,noheader (or some equivalent detection, feel free to suggest!) returns something < 7?
    • or the user passes an arch <7 through NVIDIA_PREPEND_FLAGS ?
  3. Or would you favor the existing behavior (no configurability) with a comment re: supported archs?
dot-asm commented 1 year ago

The thing about -arch=native is that it doesn't leave ptx [behind], so that the resulting executable can't be launched on any other platform. This goes against the intention to produce "run-everywhere" binaries. But why the questions? In either case, option 3 was the conscious choice from the get-go. The fact is that with that much inline assembler there is not much left to do for the compiler...

As for --verbose. What is it you want to see in addition to cargo ... -vvv? I'd argue that if you're in a position that you actually need to reach out for that kind of information, then you can just as well pull down the repo and temporarily modify build.rs. Point is that the flag in question is for so low-level trouble-shooting that only one or two users would actually need it.

Just in case, as for option 3. If you want to add a comment, then use the opportunity to copy flags from sppark/poc/msm-cuda. It's no harm in doing it in advance:-)

huitseeker commented 1 year ago

Alright, it seems better to wait for the release of sppark/poc/msm-cuda, since it seems this would require a larger scope of updates anyway. Thanks for the explanations.