Hamiltonian calculation on CUDA returns only 0s

Running the sqaodpy/example/dense_graph_annealer.py script without cuda returns

h= [4. 4. 4. 4. 4. 4. 4. 4.]
J= [[-0.    0.25  0.25  0.25  0.25  0.25  0.25  0.25]
 [ 0.25 -0.    0.25  0.25  0.25  0.25  0.25  0.25]
 [ 0.25  0.25 -0.    0.25  0.25  0.25  0.25  0.25]
 [ 0.25  0.25  0.25 -0.    0.25  0.25  0.25  0.25]
 [ 0.25  0.25  0.25  0.25 -0.    0.25  0.25  0.25]
 [ 0.25  0.25  0.25  0.25  0.25 -0.    0.25  0.25]
 [ 0.25  0.25  0.25  0.25  0.25  0.25 -0.    0.25]
 [ 0.25  0.25  0.25  0.25  0.25  0.25  0.25 -0.  ]]
c= 18.0
{'algorithm': 'coloring', 'n_trotters': 8, 'precision': 'double', 'device': 'cpu'}
E 64.0
Number of solutions : 1
[1 1 1 1 1 1 1 1]

With cuda support it returns

h= [0. 0. 0. 0. 0. 0. 0. 0.]
J= [[0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]]
c= 0.0
{'algorithm': 'coloring', 'n_trotters': 8, 'precision': 'double', 'device': 'cuda'}
E 0.0
Number of solutions : 1
[ 0  0  0  0  0  0 -7 32]

I am running this on the Tsubame super computer which uses NVIDIA TESLA P100 for NVlink-Optimized Servers and compiled with openmpi support. I use cuda 10.2.89 and python 3.11.2.

I tested various input W (not only the standard one) and it seems that at the setQUBO function in the CUDA submodule some calculation goes wrong. I suppose the reason why J and h are equal to 0 I could check in https://github.com/shinmorino/sqaod/blob/485dd7f832936e8fe11d70d07e32eea0187baa4b/sqaodc/cuda/DeviceFormulas.cpp#L23C1-L31C2 but what confuses me more is the solution, since I expected it to return a binary array and not an array of integers, so I suppose something more fundamental is broken, do you maybe have a pointer where I could start looking?

shinmorino / sqaod

Hamiltonian calculation on CUDA returns only 0s #71