soedinglab / CCMpred

Protein Residue-Residue Contacts from Correlated Mutations predicted quickly and accurately.
http://www.ncbi.nlm.nih.gov/pubmed/25064567
GNU Affero General Public License v3.0
107 stars 25 forks source link

Converting large proteins #17

Closed wlgfour closed 4 years ago

wlgfour commented 5 years ago

Hi,

I am trying to use ccmpred on large MSAs (~12,000 rows and ~3,000 columns). I compiled it with padding off as well as no cuda and the system has roughly 750GB of free RAM with more swap available. The calculation that is provided for the memory requirement (4(4(LL2121 + L20) + 23NL + N + LL) + 2N*L + 1024) tells me that my conversion should only take ~70GB of RAM, but ccmpred exits immediately with an error message "not enough memory to allocate variables."

Thank you for your help!

croth1 commented 5 years ago

Dear @wlgfour,

I can confirm that there seems to be a problem with very large L.

L=3000 gives 3000x3000x21x21 coupling parameters alone. The number of parameters in CCMpred is encoded as a raw int, therefore limited by 2^31-1. It could be possible to rescue that by changing this to size_t. I am not sure whether the problem will be infeasible due to runtime though.

Unfortunately, we do not have anybody working on CCMpred here, so there is only very limited support available.

wlgfour commented 5 years ago

Thank you for your timely response!

How would you recommend changing this? Just change all instances of "int" to "size_t"?

With respect to the potential runtime infeasibility, do do you know of any other programs or strategies that might be more practical for large proteins?

croth1 commented 5 years ago

I'd try changing nvar to size_t here: https://github.com/soedinglab/CCMpred/blob/master/include/ccmpred.h#L65. Could be that you have to change some followup problems related to the change of data types.

With respect to the potential runtime infeasibility, do do you know of any other programs or strategies that might be more practical for large proteins?

I do not know of any coupling based approach that does not have the same runtime limitations as CCMpred.

wlgfour commented 5 years ago

I am still getting either ERROR: ERROR: Not enough memory to allocate variables! or Segmentation fault (core dumped) after changing nvar to size_t.

sokrypton commented 5 years ago

We have a slightly slower, non-GPU, implementation that requires only 1/2 the memory (I believe) you can try here: https://github.com/sokrypton/GREMLIN_CPP