rcsb / symmetry

:ferris_wheel: Detect, analyze, and visualize protein symmetry
GNU Lesser General Public License v2.1
26 stars 16 forks source link

Random error detecting symmetry #118

Closed sbliven closed 1 week ago

sbliven commented 1 month ago

CE-Symm 2.2.2 gives an error when writing out the symmetry of the N0BET9 AlphaFold model:

runCESymm.sh --simple - -J --rndseed 1 -v AF-N0BET9-F1-model_v4.cif
Structure   NumRepeats  SymmGroup   Reason
272 [main] INFO  org.biojava.nbio.structure.align.client.StructureName - Provided structure name 'AF-N0BET9-F1-model_v4.cif' matches file name in directory /Users/bliven_s/Downloads. Will read structure data from file /Users/bliven_s/Downloads/AF-N0BET9-F1-model_v4.cif.
621 [pool-2-thread-1] INFO  org.biojava.nbio.structure.symmetry.internal.CeSymm - Open Symmetry detected
632 [pool-2-thread-1] INFO  org.biojava.nbio.structure.align.client.StructureName - Provided structure name 'AF-N0BET9-F1-model_v4.cif' matches file name in directory /Users/bliven_s/Downloads. Will read structure data from file /Users/bliven_s/Downloads/AF-N0BET9-F1-model_v4.cif.
632 [pool-2-thread-1] INFO  org.biojava.nbio.structure.align.client.StructureName - Provided structure name 'AF-N0BET9-F1-model_v4.cif' matches file name in directory /Users/bliven_s/Downloads. Will read structure data from file /Users/bliven_s/Downloads/AF-N0BET9-F1-model_v4.cif.
1193 [pool-2-thread-1] INFO  org.biojava.nbio.structure.symmetry.internal.CeSymm - Open Symmetry detected
1194 [pool-2-thread-1] WARN  org.biojava.nbio.structure.symmetry.utils.SymmetryTools - Adding 0 ligands to .A_1-16
1195 [pool-2-thread-1] WARN  org.biojava.nbio.structure.symmetry.utils.SymmetryTools - Adding 0 ligands to .A_19-34
1195 [pool-2-thread-1] WARN  org.biojava.nbio.structure.symmetry.utils.SymmetryTools - Adding 0 ligands to .A_36-51
1195 [pool-2-thread-1] WARN  org.biojava.nbio.structure.symmetry.utils.SymmetryTools - Adding 0 ligands to .A_54-69
1195 [pool-2-thread-1] WARN  org.biojava.nbio.structure.symmetry.utils.SymmetryTools - Adding 0 ligands to .A_70-85
1195 [pool-2-thread-1] WARN  org.biojava.nbio.structure.symmetry.utils.SymmetryTools - Adding 0 ligands to .A_86-101
1195 [pool-2-thread-1] WARN  org.biojava.nbio.structure.symmetry.utils.SymmetryTools - Adding 0 ligands to .A_102-117
1196 [pool-2-thread-1] WARN  org.biojava.nbio.structure.symmetry.utils.SymmetryTools - Adding 0 ligands to .A_119-134
1196 [pool-2-thread-1] WARN  org.biojava.nbio.structure.symmetry.utils.SymmetryTools - Adding 0 ligands to .A_136-151
1196 [pool-2-thread-1] WARN  org.biojava.nbio.structure.symmetry.utils.SymmetryTools - Adding 0 ligands to .A_152-166
1200 [pool-2-thread-1] INFO  org.biojava.nbio.structure.cluster.SubunitCluster - SubunitClusters are structurally similar with 0.00 RMSD 0.00 coverage
1201 [pool-2-thread-1] INFO  org.biojava.nbio.structure.cluster.SubunitCluster - SubunitClusters are structurally similar with 0.00 RMSD 0.00 coverage
1201 [pool-2-thread-1] INFO  org.biojava.nbio.structure.cluster.SubunitCluster - SubunitClusters are structurally similar with 0.00 RMSD 0.00 coverage
1201 [pool-2-thread-1] INFO  org.biojava.nbio.structure.cluster.SubunitCluster - SubunitClusters are structurally similar with 0.00 RMSD 0.00 coverage
1201 [pool-2-thread-1] INFO  org.biojava.nbio.structure.cluster.SubunitCluster - SubunitClusters are structurally similar with 0.00 RMSD 0.00 coverage
1201 [pool-2-thread-1] INFO  org.biojava.nbio.structure.cluster.SubunitCluster - SubunitClusters are structurally similar with 0.00 RMSD 0.00 coverage
1201 [pool-2-thread-1] INFO  org.biojava.nbio.structure.cluster.SubunitCluster - SubunitClusters are structurally similar with 0.00 RMSD 0.00 coverage
1201 [pool-2-thread-1] INFO  org.biojava.nbio.structure.cluster.SubunitCluster - SubunitClusters are structurally similar with 0.00 RMSD 0.00 coverage
1202 [pool-2-thread-1] INFO  org.biojava.nbio.structure.cluster.SubunitCluster - SubunitClusters are structurally similar with 0.00 RMSD 0.00 coverage
1204 [pool-2-thread-1] WARN  writers.OutputWriter - Could not write result for entry: AF-N0BET9-F1-model_v4.cif. Writting empty row.
AF-N0BET9-F1-model_v4.cif   10  AF-N0BET9-F1-model_v4.cif   1   C1  Error
1204 [pool-2-thread-1] INFO  workers.CeSymmWorker - Finished job: AF-N0BET9-F1-model_v4.cif
1213 [main] INFO  main.CeSymmMain - Total runtime: 942, mean runtime: 942

Other seeds (eg --rndseed 2) are successful, but this seems to be just because they don't meet the --minlen=15 requirement so they don't get reported as significant.

sbliven commented 1 month ago

I've tracked down the bug. The root cause came from symmetry group detection. SubunitCluster.mergeStructure uses CE for alignment, and it fails on short (≤15 residue) alignments. This caused some downstream errors due to various missing coordinates.

I'm not quite sure what the correct solution to this is. We could

  1. treat all short structures as different subunits
  2. use a different structure alignment algorithm for short structures (probably SmithWaterman3Daligner). Note that TM-score is also problematic for short structures, so the thresholds might also need to be updated
  3. treat all short structures as equivalent subunits

For the CE-Symm case, (3) is obviously correct. However the code is shared with QuatSymmetryDetector and included in biojava, so I'm not sure what to do. @lafita, do you remember this code?

sbliven commented 1 month ago

BTW, the randomness is just because this particular structure varies between a 15 and 16 residue repeat depending on whether the tails and loops are aligned. There are likely tighter solenoids (or even rotational cases with short repeats) that would have the same behavior.

sbliven commented 1 week ago

This is now fixed in the biojava PR. I will release CE-Symm 2.3.0 after it gets merged and released upstream.

youkha commented 1 week ago

Hi Spencer,

Thanks. Nice to see you are still maintaining CEsymm. At some point NCBI removed the ban on java which gave some hope we could intergrate CEsymm with iCn3D, now it is unfortunately back. Maybe a Biojavascript would solve the problem?

All the best, Philippe

On Sun, Sep 1, 2024 at 2:36 PM Spencer Bliven @.***> wrote:

This is now fixed in the biojava PR. I will release CE-Symm 2.3.0 after it gets merged and released upstream.

— Reply to this email directly, view it on GitHub https://github.com/rcsb/symmetry/issues/118#issuecomment-2323454033, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD35NIOCNCZ3JA7O6C7SYHLZUNNEPAVCNFSM6AAAAABMMMHZD6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRTGQ2TIMBTGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- -- Philippe Youkharibache, Ph.D. US +1 415 448 6509 FR +33 666 00 4775

sbliven commented 1 week ago

@youkha I'm barely keeping up with the java maintenance, so ports to other languages are out of the question. Maybe you can run java inside a container? iCn3D integration seems like it would require a reimplementation though.

This fix is merged upstream. I'll try to build a beta build now from the SNAPSHOT while we wait for a biojava release.