Closed Wes-V3 closed 2 years ago
This is embarrassing, but it appears that calibur ran out of memory to allocate.
I wonder if it is possible for me to look at the decoys data that caused the problem? That will help me confirm if it really is a memory allocation problem, and perhaps find a workaround.
Of course! Thank you! RNA_structures.zip
This is embarrassing, but it appears that calibur ran out of memory to allocate.
I wonder if it is possible for me to look at the decoys data that caused the problem? That will help me confirm if it really is a memory allocation problem, and perhaps find a workaround.
A quick glance showed a negative threshold RMSD value. This could be due to a floating point value overflow, which implies that the input decoys could be very dissimilar.
But give me some time to confirm and fix this. I will update you ASAP.
This is what happened.
calibur uses only the coordinates of the lines marked with CA
(carbon alpha) in the pdb files. For example, in the following, only the coordinates of the second line is used.
ATOM 1 N VAL A 238 11.251 -9.834 -2.389 1.00 0.00 N ATOM 2 CA VAL A 238 11.896 -8.749 -3.115 1.00 0.00 C ATOM 3 C VAL A 238 13.369 -8.689 -2.728 1.00 0.00 C
However, the decoy files you tested contain no CA
elements. Because of this, all the RMSD computed are set to NaN. This resulted in no cluster discovered, causing calibur to go into an infinite loop.
To solve this quickly, replace the symbol of the lines which you want to use as CA elements with CA
. For example, the following bash command would substitute all the lines of symbols C1'
, C2'
, ..., C5'
, to CA
for F in *.pdb; do cat $F |sed 's/ 0 / A /' |sed "s/C.'/CA /" > output/$F; done
(The command will also change the chain from 0
to A
, since calibur defaults to use only the chains A
, C
, or " "
.)
This is what happened.
calibur uses only the coordinates of the lines marked with
CA
(carbon alpha) in the pdb files. For example, in the following, only the coordinates of the second line is used.ATOM 1 N VAL A 238 11.251 -9.834 -2.389 1.00 0.00 N ATOM 2 CA VAL A 238 11.896 -8.749 -3.115 1.00 0.00 C ATOM 3 C VAL A 238 13.369 -8.689 -2.728 1.00 0.00 C However, the decoy files you tested contain no
CA
elements. Because of this, all the RMSD computed are set to NaN. This resulted in no cluster discovered, causing calibur to go into an infinite loop.To solve this quickly, replace the symbol of the lines which you want to use as CA elements with
CA
. For example, the following bash command would substitute all the lines of symbolsC1'
,C2'
, ...,C5'
, toCA
for F in *.pdb; do cat $F |sed 's/ 0 / A /' |sed "s/C.'/CA /" > output/$F; done
(The command will also change the chain from
0
toA
, since calibur defaults to use only the chainsA
,C
, or" "
.)
This solves my problem, thank you for your quick reply and solution. If possible, could you provide an option like "-RNA" to make it better applicable to clustering RNA structures? In any case, thanks for your excellent work!
Thanks for the suggestion. But I am not yet familiar with RNA notations so I am not sure how to make the change. OTOH, I have added an option to allow renaming of the atom for comparison, as well as an option to allow all chains. I hope these changes make the analysis of RNAs easier.
On Windows:
On Linux:
Using calibur in Rosetta on Linux:
I'm using calibur to clustering RNA structures. pdblist.txt