Unknown CUDA device compute capability: 8.7

Thanks for the great project :)

Encountered this while running on a jetson AGX orin device (sm_87 arch), from looking at the source code a bit it seems like each capability needs to be explicitly accounted for?

$ nvidia-device-query
CUDA device query (Driver API, statically linked)
CUDA driver version 11.4
CUDA API version 11.4
Detected 1 CUDA capable device

Device 0: Orin
*** Warning: Unknown CUDA device compute capability: 8.7
*** Please submit a bug report at https://github.com/tmcdonell/cuda/issues

  CUDA capability:                          8.7
  CUDA cores:                               1024 cores in 16 multiprocessors (64 cores/MP)
  Global memory:                            61 GB
  Constant memory:                          64 kB
  Shared memory per block:                  48 kB
  Registers per block:                      65536
  Warp size:                                32
  Maximum threads per multiprocessor:       1536
  Maximum threads per block:                1024
  Maximum grid dimensions:                  2147483647 x 65535 x 65535
  Maximum block dimensions:                 1024 x 1024 x 64
  GPU clock rate:                           1.3 GHz
  Memory clock rate:                        1.3 GHz
  Memory bus width:                         128-bit
  L2 cache size:                            4 MB
  Maximum texture dimensions
    1D:                                     131072
    2D:                                     131072 x 65536
    3D:                                     16384 x 16384 x 16384
  Texture alignment:                        512 B
  Maximum memory pitch:                     2 GB
  Concurrent kernel execution:              Yes
  Concurrent copy and execution:            Yes, with 2 copy engines
  Runtime limit on kernel execution:        No
  Integrated GPU sharing host memory:       Yes
  Host page-locked memory mapping:          Yes
  ECC memory support:                       No
  Unified addressing (UVA):                 Yes
  Single to double precision performance:   32 : 1
  Supports compute pre-emption:             Yes
  Supports cooperative launch:              Yes
  Supports multi-device cooperative launch: Yes
  PCI bus/location:                         0/0
  Compute mode:                             Default
    Multiple contexts are allowed on the device simultaneously

tmcdonell / cuda

Unknown CUDA device compute capability: 8.7 #77