CUDA device arch 37 does not support fp16

olalonde commented 7 years ago

Getting the following error (AWS p2.xlarge instance), with a list of descriptors return by OpenCV SIFT extract (des in kp, des = sift.detectAndCompute(image, None)).

arguments: 1 0x7fff855fff14 0.010 0.10 0 72718 64 1000 3 0 1 3 0x7f379c9da010 0x3c923c0 0x3cd0bd0 (nil)
CUDA device arch 37 does not support fp16

This example is working though:

import numpy
from libKMCUDA import kmeans_cuda

numpy.random.seed(0)
arr = numpy.empty((10000, 2), dtype=numpy.float32)
arr[:2500] = numpy.random.rand(2500, 2) + [0, 2]
arr[2500:5000] = numpy.random.rand(2500, 2) - [0, 2]
arr[5000:7500] = numpy.random.rand(2500, 2) + [2, 0]
arr[7500:] = numpy.random.rand(2500, 2) - [2, 0]
centroids, assignments = kmeans_cuda(arr, 4, verbosity=1, seed=3)
print(centroids)

So I'm guessing that I need to transform the numpy array returned by opencv? Sorry, pretty new to Python.

olalonde commented 7 years ago

Here's how my arr looks like (2 features of 128 dimensions each):

[array([  13.,    0.,    0.,    0.,    0.,    0.,    0.,    4.,  166.,
          0.,    0.,    0.,    0.,    0.,    2.,   93.,   78.,    0.,
          0.,    0.,    0.,    1.,   90.,   78.,    0.,    0.,    0.,
          0.,    0.,    6.,  124.,    3.,   32.,    1.,    0.,    0.,
          0.,    0.,    0.,    5.,  166.,   36.,    6.,    0.,    0.,
          0.,    0.,   47.,  122.,   33.,   66.,    1.,    0.,    0.,
         12.,   31.,    1.,    6.,   66.,    8.,    0.,    3.,   21.,
          2.,   20.,    4.,    0.,    0.,    0.,    0.,    0.,    2.,
        143.,  156.,   78.,    1.,    0.,    0.,    0.,    1.,   22.,
        100.,  166.,    1.,    0.,    0.,    0.,    0.,    0.,   36.,
        166.,    2.,    0.,    0.,    0.,    0.,    3.,    1.,    0.,
          0.,    0.,    0.,    0.,    1.,    6.,   13.,   17.,    2.,
          2.,    0.,    0.,    0.,    0.,    8.,   74.,    4.,    2.,
          0.,    0.,    0.,    0.,   10.,   81.,    1.,    0.,    0.,
          0.,    0.], dtype=float32), array([   6.,    3.,    1.,   85.,  145.,    0.,    0.,    0.,  145.,
         13.,    1.,   20.,   17.,    0.,    0.,   14.,  122.,   17.,
          6.,    0.,    0.,    0.,    0.,   10.,    8.,    8.,   10.,
          8.,   25.,   22.,    3.,    6.,   14.,    2.,    0.,   20.,
        145.,    9.,    0.,    1.,  145.,   13.,    0.,    6.,   30.,
          1.,    0.,   11.,  138.,   14.,    2.,    0.,    0.,    0.,
          0.,    5.,   45.,   21.,    1.,    0.,    8.,   11.,    2.,
          7.,   12.,    0.,    8.,   70.,  145.,    5.,    2.,   24.,
        145.,    3.,    0.,    1.,   22.,    2.,    4.,  133.,   86.,
         14.,    3.,    3.,    7.,    4.,    6.,   30.,   42.,   51.,
          4.,    4.,    6.,    3.,    0.,    1.,   59.,    4.,   20.,
         41.,    9.,    1.,    2.,   98.,   43.,    0.,    0.,    5.,
          4.,    2.,   15.,  145.,    7.,    1.,    1.,   42.,   36.,
          6.,   14.,   35.,   81.,   12.,    3.,   17.,   11.,    1.,
          0.,    2.], dtype=float32)]

print(type(arr))
print(type(arr[0]))
print(type(arr[0][0]))

Outputs:

<class 'list'>
<class 'numpy.ndarray'>
<class 'numpy.float32'>

olalonde commented 7 years ago

numpy.asarray(arr)

seemed to fix my problem

src-d / kmcuda

CUDA device arch 37 does not support fp16 #14