CCMpred is limited to MSA length (ncols) = 1787 residues

jhschwartz commented 2 years ago

I came across the following issue when attempting to use CCMpred upon an MSA of length 1898 and depth 395.

ERROR: ERROR: Not enough memory to allocate variables!

 _____ _____ _____               _ 
|     |     |     |___ ___ ___ _| |
|   --|   --| | | | . |  _| -_| . |
|_____|_____|_|_|_|  _|_| |___|___|
                  |_|              

Found 1 CUDA devices, using device #0: NVIDIA A40
Total GPU RAM:     47,850,782,720
Free GPU RAM:      47,578,415,104
Needed GPU RAM: 18,446,744,043,827,191,608 ⚠

Noticing that 18,446,744,043,827,191,608 is close to ULLONG_MAX, and noting that the expected needed RAM (according to @kWeissenow's fixed README memory calculation) is ~39GB, I figured this must be a case of a number being too large for its type.

In src/ccmpred.c lines 387-390, we have:

int nsingle = ncol * (N_ALPHA - 1);
int nvar = nsingle + ncol * ncol * N_ALPHA * N_ALPHA;
int nsingle_padded = nsingle + N_ALPHA_PAD - (nsingle % N_ALPHA_PAD);
int nvar_padded = nsingle_padded + ncol * ncol * N_ALPHA * N_ALPHA_PAD;

It seems that nvar_padded is the culprit. In my case, I have N_ALPHA = 21 and N_ALPHA_PADDED = 32. For ncols = 1898, nvar_padded should be 2420853472. This is above INT_MAX, 2147483647, so nvar_padded becomes negative, causing the bug. To keep nvar_padded within its integer limit, the MSA length must be less than or equal to 1787.

I've tried changing all cases of the four variables above to long type, but it's causing problems with CUDA and I am so far unable to find the source of the problem. While using cuda-gdb I get the following error:

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0xbb35e0

Thread 1 "ccmpred" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 1, grid 6, block (1231,18,0), thread (96,0,0), device 0, sm 0, warp 18, lane 0]
0x0000000000bb38f0 in d_edge_gradients_histogram_weighted(unsigned char*, float*, float const*, int, int)<<<(1898,21,1),(128,1,1)>>> ()

I'm unsure what to make of this. I will continue to work on a fix but if anyone has suggestions in the meantime, please let me know!

jhschwartz commented 2 years ago

It seems this is related to a few older issues: #17 #19 #27 #32

jhschwartz commented 2 years ago

On CPU, using gdb, the encountered bug is:

Reweighted 395 sequences with threshold 0.8 to Beff=213.769 weight mean=0.541186, min=0.0714286, max=1

Will optimize 1588698124 32-bit variables

iter    eval    f(x)        ║x║         ║g║         step

Program received signal SIGSEGV, Segmentation fault.
0x00002aaaac632aa0 in __memset_sse2 () from /lib64/libc.so.6

soeding commented 2 years ago

Hi @jhschwartz, thanks for the useful analysis! I talked to Stefan Seemayer. He suggested to set N_ALPHA_PADDED = N_ALPHA, so use no padding. This should trade speed for memory efficiency and should allow you to go up to a number of columns L ~ (2^31 / 21^2)^0.5 = 2206. (I get max 1448 for N_ALPHA_PAD = 32, not 1787).

Also, I wondered if changing the type of nvar_padded to unsigned int instead of long might avoid the downstream errors in CUDA and libconjugrad. If it works it would give you another factor 1.412 in the maximum number of columns.

sseemayer commented 2 years ago

Sorry for the delayed/nonexistant responses. I've changed responsibilities and don't find the time (or access to suitable GPUs) to maintain this anymore or look at this issue in depth, but here are some ideas of things to try.

As Johannes has mentioned, you might try to recompile CCMpred with the WITH_PADDING option set to off in order to slightly reduce the memory requirements.

It might be that there are some systematic problems hiding in the code that prohibits you from working with large MSAs, and that the small change is not really going to solve your problem. In that case, combing over both the CPU and GPU code to look for potential integer overflows sounds like a good strategy.

jhschwartz commented 2 years ago

Hi @soeding @sseemayer ! I wanted to try out your suggestions before replying so I too am sorry for the delay in my reply.

I can confirm that compiling with padding off increases the MSA width limit/decreases the required memory, although changing the type of nvar_padded still causes downstream errors in libconjugrad. Fortunately, 2206 is wide enough for my use so there's no need to worry about it.

Thanks again so much for your help!

soedinglab / CCMpred

CCMpred is limited to MSA length (ncols) = 1787 residues #34