richard-evans / vampire

Atomistic simulator for magnetic materials
GNU General Public License v2.0
122 stars 92 forks source link

Swapped cusp spmv for cusparse spmv. #46

Closed mattoaellis closed 3 years ago

mattoaellis commented 4 years ago

I have swapped out the older cusp sparse matrix vector routines for the cusparse generalised interface ones. It compiles and I get results matching the previous cusp implementation following the DMI test. There are still errors with the GPU stats calculation but with the gpu:calculate-statistics-on-cpu=true flag the magnetisation appears close to the CPU version.

Vampire_CPU_GPU_cusparse

richard-evans commented 4 years ago

Thanks Matt -though I am puzzled why changing the exchange routines should give different answers - I would have expected the outcome to be identical?

mattoaellis commented 4 years ago

Hi Richard, The GPU and CPU are slightly different as it is starting from a thermal state and so I guess the different RNG for CUDA or CPU give some small variation in the starting configuration. The CUSPARSE exchange routines give a similar result if you compare to the cusp results in the previous pull request here https://github.com/richard-evans/vampire/pull/20 (note that CUDA is on the right in this comment thread but left in the old one). They show similar shapes but I must have had a different system size as the number of pixels is different. Maybe we can try a different test starting from the same thermalised ensemble generated by the CPU?

richard-evans commented 4 years ago

Hi Matt, OK awesome - I'll do the check and integrate everything - for some reason I thought the comparison was between GPU versions. Thanks!

mattoaellis commented 4 years ago

Hi Richard,

I noticed another issue inquiring about the status of the CUDA upgrade and I was wondering if you had had time to look over this pull request?

Cheers, Matt

richard-evans commented 3 years ago

Thanks for the prompt Matt - now all merged! Cheers, Richard

richard-evans commented 3 years ago

Hi Matt, so I think there is a bug in the exchange calculation. I have fixed some compiler bugs in the cuda branch, so this now works out of the box with CUDA10, but if you set the exchange to zero then it complains about argument 5 of the call to cusparseCreateCsr():

** On entry to cusparseCreateCsr() parameter number 5 (csrColInd) had an illegal value Failed to initialise sparse matrix descriptor!

The simulation also gives incorrect results for 100K atoms at RT, since the magnetization goes to zero. The code segment itself seem innocuous enough, so I am a little bit puzzled. I also have a project student to work on the CUDA version this year, and hopefully add in MPI functionality too. If you have a moment, could you possibly take a quick peek to see if you can see any obvious issue? If not not to worry, I'll have another look as soon as possible.

Cheers,

Richard

mattoaellis commented 3 years ago

Hi Richard,

That's odd. I'll take a quick look. I checked it with the tensor exchange so it must be some separate part of the vector or scalar exchange.

Cheers, Matt