Closed vyasr closed 3 months ago
NATIVE
never produces any form of SASS. When no GPU is detected on the machine the goal was to fall back to generate SASS for all supported GPUs so the code will run.
I don't think we should expect that NATIVE
will ever produce the same as RAPIDS
when no CUDA driver / GPU exists.
Consider going forward we might want to start having RAPIDS
generate 90a code, that wouldn't be needed for NATIVE
in fallback mode as 90
is sufficient for SASS execution.
Thinking about this more the proposal to change NATIVE
to the now usable native
( via cmake ) removes the need to change this logic ( https://github.com/rapidsai/rapids-cmake/issues/320 )
Under cmake native
( aka -arch=native
) when no GPU / CUDA is found the compiler defaults back to -arch=sm_MinX.Y
.
So I think we can close this and move forward with deprecating NATIVE
in 24.08
NATIVE never produces any form of SASS. When no GPU is detected on the machine the goal was to fall back to generate SASS for all supported GPUs so the code will run.
I assume you mean that NATIVE never produces any form of PTX? It's not clear to me that the fallback to "build all supported SASS" is necessarily a better choice than producing the same behavior as RAPIDS. I get your point with the 90a example, but conversely if the goal is to make it "so the code will run" wouldn't you also want to produce nonzero PTX in case you end up on a newer architecture than the list of supported architectures? That's why we include PTX when generating with RAPIDS.
So I think we can close this and move forward with deprecating NATIVE in 24.08
In any case, I was also thinking about the switch to native
as well when writing up this issue, and I agree that probably makes this issue moot so I'm fine closing.
Currently when NATIVE architectures are specified but no local GPUs are detected,
rapids_cuda_set_architectures
falls back to producing the list of supported architectures. This is done by passing that list of architectures torapids_cuda_detect_architectures
, which then uses it as the fallback output. The result is that if native arch detection fails, NATIVE is equivalent to RAPIDS, _except that the latest virtual architecture is not built like it is for RAPIDS_. This behavior seems confusing. If it was an intentional design decision for rapids-cmake to fall back to producing all supported GPU architectures if native detection failed -- and I assume it was since that would only occur on CPU-only machines that are very likely to be machines that are being used to build packages for redistribution (e.g. our CI) -- then I would expect that this fallback should also produce what we consider to be the default build option for RAPIDS.Should we change NATIVE to use the RAPIDS behavior when native detection fails?