scottgs / cvtile

GDAL + OpenCV + CUDA
BSD 3-Clause "New" or "Revised" License
10 stars 5 forks source link

sm_codes can't be specified #34

Open ghost opened 8 years ago

ghost commented 8 years ago

It is an error to specify sm_{} codes, such as sm_60, for older architectures that we do support. This line needs to be removed, edited, etc. There are a few options. one is auto detection of the correct flags.. or just allowing the user to pass in the correct cuda architecture.

interputed commented 8 years ago

This is not a bug and functions as expected on the 2 different architectures tested. The compiler compiles PTX code for a target minimum architecture, then at compile time chooses the best available arch from the listed supported sm_XX parameters to compile the PTX to binary. As long as your card supports compute capability 3.0 or higher, it will function normally. Documentation on this can be found here: http://docs.nvidia.com/cuda/cuda-c-programming-guide/#application-compatibility

ghost commented 8 years ago

Not exactly -- the PTX is in fact built for a minimum virtual architecture {fermi, kepler, etc..} however, an executable is built that includes the entire SASS ( binary ) set that you specify as the 'compute' codes if you will. So if you target compute_30.. and specify sm_30 ... sm52 this works and the binary then has SASS code for each of the specified codes and the PTX for the virtual architecture. This is known as the fat binary version of cuda compilation. You don't even have to specify the sm{} as compute codes unless you really wish to skip JIT compilation, so its just an 'optimization.' Given our incredibly simple PTX, compilation time for them is negligible and will not be a bottle neck so it likely isn't even necessary.

http://stackoverflow.com/questions/17599189/what-is-the-purpose-of-using-multiple-arch-flags-in-nvidias-nvcc-compiler

Now the issue, sm_60 is only in cuda 8.0 release candidate. So semantics aside, unless we only support the release candidate (which is a bad idea), then you either need to shorten the list or have a switch. Since I am not on the 'bleeding' edge, as I do not trust nvidia, I do not have CUDA 8.0 and therefore could not build. I pushed a short list, so that I can build to include my fixes from awhile ago, but feel free to do something better.

https://devtalk.nvidia.com/default/topic/938242/thar-she-blows-cuda-8-rc-available/

make sense?

interputed commented 8 years ago

Yeah, I'll have to look for a fix on Tuesday, thanks for the clearer explanation! The problem is in the lab we use GTX 1080's which wouldn't build unless I specified sm_61, and we do use cuda 8.0 as a result. So, the cuda 8.0 compatibility is necessary for our "bleeding edge" GPU's. However, at the time I ran into the problem, we still had the symlink files in place so it was trying to only compile for sm_50, which failed. It may actually work now without the sm_61 flag. Regardless, I'll figure it out with your suggestions. I'm new here so as I make tweaks I'm still learning how it all works. So feel free to correct me anytime!

ghost commented 8 years ago

Sure thing! I'm just working on cleaning up a few old TODO(s) in my spare time so that future students have a cleaner starting point -- I appreciate the effort. If you need help let me know and no rush. Also, no you are correct, you need CUDA 8.0 for pascal cards, so someway or another, you need, at the very minimum, the PTX flag. You'll have to add that back in to build on your machines.. I'll just keep a local copy for now that doesn't use that flag. As per build tools, I'm not sure what the best way, with auto tools, to 'flip' that switch; Grant may be a better goto for that.