peterwittek / somoclu

Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters
https://peterwittek.github.io/somoclu/
MIT License
268 stars 70 forks source link

Cannot cast array data from dtype('O') to dtype('float32') according to the rule 'safe' #141

Closed AlanGanem closed 5 years ago

AlanGanem commented 5 years ago

Im trying to train a SOM with a sparse matrix (csr_matrix format) in python 3.6 . It contains arorund 500.000 rows (ad titles) and arround 60.000 columns (vocabulary), each row contains something an average of 7 nonzero values. here's the code i've written so far:

`import somoclu import pickle

g = os.path.join(os.path.dirname(r'C:\Users\ganem\Desktop\2Vintens\dados\Historical data\janeiro_2019\dados_tratados\'), 'cv_matrix_janeiro') cv_matrix = pickle.load(open(g, 'rb')) cv_matrix_test = cv_matrix.astype('float32', copy = False)[:500]

n_rows, n_columns = 20, 20 som = somoclu.Somoclu(n_columns, n_rows, compactsupport=False) som.train(cv_matrix_test)`

the cv_matrix has been saved and load with pickle from another piece of code and i've sliced the matrix just to teste the module. Turns out i keep getting the same error over and over:

`Traceback (most recent call last):

File "", line 1, in som.train(cv_matrix_test)

File "C:\Users\ganem\AppData\Local\Programs\Python\Python36\Lib\site-packages\somoclu\train.py", line 228, in train self.umatrix)

TypeError: Cannot cast array data from dtype('O') to dtype('float32') according to the rule 'safe'`

i've already casted the csr_matrix to float32 beforehand , but i keep getting the same error.

does someone know what might be going on?

update 1: from xgdgsc's comment i then casted the csr_matrix to np.intp, which solved the 'O' type problem, but now i get the following error:

' File "C:\Users\ganem\AppData\Local\Programs\Python\Python36\Lib\site-packages\somoclu\train.py", line 218, in train self.update_data(data)

File "C:\Users\ganem\AppData\Local\Programs\Python\Python36\Lib\site-packages\somoclu\train.py", line 243, in update_data self._data = np.float32(data)

ValueError: setting an array element with a sequence.'

It seems like the function does not recognize the csr_matrix as a valid dtype and then tries to cast it, with no success.

Update 2: I cannot transform the sparse amtrix into np.array since it wont fit in memory

xgdgsc commented 5 years ago

https://stackoverflow.com/questions/39452792/cannot-cast-array-data-from-dtypeo-to-dtypefloat64 some comments of answer might be helpful.

AlanGanem commented 5 years ago

Thank you for your answer! I've casted the matrix to np.intp but now i get the following error in self._data = np.float32(data): ValueError: setting an array element with a sequence.'

xgdgsc commented 5 years ago

https://stackoverflow.com/questions/4674473/valueerror-setting-an-array-element-with-a-sequence have you googled first?