Encountered this while running on a jetson AGX orin device (sm_87 arch), from looking at the source code a bit it seems like each capability needs to be explicitly accounted for?
$ nvidia-device-query
CUDA device query (Driver API, statically linked)
CUDA driver version 11.4
CUDA API version 11.4
Detected 1 CUDA capable device
Device 0: Orin
*** Warning: Unknown CUDA device compute capability: 8.7
*** Please submit a bug report at https://github.com/tmcdonell/cuda/issues
CUDA capability: 8.7
CUDA cores: 1024 cores in 16 multiprocessors (64 cores/MP)
Global memory: 61 GB
Constant memory: 64 kB
Shared memory per block: 48 kB
Registers per block: 65536
Warp size: 32
Maximum threads per multiprocessor: 1536
Maximum threads per block: 1024
Maximum grid dimensions: 2147483647 x 65535 x 65535
Maximum block dimensions: 1024 x 1024 x 64
GPU clock rate: 1.3 GHz
Memory clock rate: 1.3 GHz
Memory bus width: 128-bit
L2 cache size: 4 MB
Maximum texture dimensions
1D: 131072
2D: 131072 x 65536
3D: 16384 x 16384 x 16384
Texture alignment: 512 B
Maximum memory pitch: 2 GB
Concurrent kernel execution: Yes
Concurrent copy and execution: Yes, with 2 copy engines
Runtime limit on kernel execution: No
Integrated GPU sharing host memory: Yes
Host page-locked memory mapping: Yes
ECC memory support: No
Unified addressing (UVA): Yes
Single to double precision performance: 32 : 1
Supports compute pre-emption: Yes
Supports cooperative launch: Yes
Supports multi-device cooperative launch: Yes
PCI bus/location: 0/0
Compute mode: Default
Multiple contexts are allowed on the device simultaneously
Thanks for the great project :)
Encountered this while running on a jetson AGX orin device (sm_87 arch), from looking at the source code a bit it seems like each capability needs to be explicitly accounted for?