segfault for large matrices in dgemm

linnanwang / BLASX

a heterogeneous multiGPU level-3 BLAS library

45 stars 11 forks source link

segfault for large matrices in dgemm #1

Open jkn93 opened 8 years ago

jkn93 commented 8 years ago

I've modified the gemm-example to use dgemm only with matrices of dimension 30000x30000. Using a server with 4 GTX Titan cards the program produces a segfault. It seems that there aren't any checks regarding available device-memory.

nvidia-smi:

linnanwang commented 8 years ago

That's correct. 30000x30000 double precision should go beyond of 6GB. I don't recommend testing double precision on TITAN. We normally use K40. You can try single, and it should delivery very good performance.

jkn93 commented 8 years ago

Thank you for the quick reply. I'm using the older Titan-cards with 1/3-dp performance on my dev-server. I've considered the performance of dgemm w/ matrices of dim. 20000x20000 as rather promising, so I was actually wondering if your library is considered for production use. Therefore, it should be able to handle larger matrices, check the available mem@dev and employ some sort of batching.

linnanwang commented 8 years ago

It can go any matrix size as long as you have enough host RAM (The current memory issue is due to the way I'm marking for finished tasks; however, it does not incur too much benefit for implementing a very complex logic). I have handed out a copy of my paper to forks at MATLAB and NVIDIA. I will leave the production codes for them as it is too much work for an individual if they are interested in taking on BLASX. Besides memory checking, there are a lot additional work considering so many variations of level 3 BLAS such as Trans, Uplo, and Diag.

Can you tell me why you need a matrix multiplication of 10^4 size? Thanks.

jkn93 commented 8 years ago

Good call, let's hope they make the best of it. I'm developing ab initio electronic structure methods for large-scale systems.