buildPDBensemble freeze mapping structures

Yago52 commented 1 year ago

I have just upgraded to version 2.4 and I am having trouble with the buildPDBensemble function. When I was building the ensemble the mapping freezes since it founds a structure with a low sequence coverage. Once this structure is removed from the list, the program freeze in the next structure with the same characteristics. In addition, seems that the keywords used in some tutorials (seqid, coverage, ...) do not work anymore.

Thanks for your attention. I’m looking forward to your reply.

Yago52 commented 1 year ago

For example: @>Mapping 7zrvA_ca to the reference... [ 1%]@> WARNING no atommaps were available. Consider adjusting accepting criteria @> Mapping 7xdkA_ca to the reference... [ 2%]@> WARNING no atommaps were available. Consider adjusting accepting criteria @> Mapping 7xdbA_ca to the reference... [ 2%]@> WARNING no atommaps were available. Consider adjusting accepting criteria @> Mapping 7wwjA_ca to the reference... [ 2%] @> WARNING no atommaps were available. Consider adjusting accepting criteria @> Mapping 8aqvA_ca to the reference... [ 2%] @>

jamesmkrieger commented 1 year ago

Thanks for reporting this. What version were you using previously that did work?

jamesmkrieger commented 1 year ago

Have

For example: @>Mapping 7zrvA_ca to the reference... [ 1%]@> WARNING no atommaps were available. Consider adjusting accepting criteria @> Mapping 7xdkA_ca to the reference... [ 2%]@> WARNING no atommaps were available. Consider adjusting accepting criteria @> Mapping 7xdbA_ca to the reference... [ 2%]@> WARNING no atommaps were available. Consider adjusting accepting criteria @> Mapping 7wwjA_ca to the reference... [ 2%] @> WARNING no atommaps were available. Consider adjusting accepting criteria @> Mapping 8aqvA_ca to the reference... [ 2%] @>

Have you tried adjusting criteria such as rmsd_reject as well?

jamesmkrieger commented 1 year ago

Also, could you share your code please to help us figure out what's happening?

Yago52 commented 1 year ago

Thanks for reporting this. What version were you using previously that did work?

I was using prody 2.2

Yago52 commented 1 year ago

Have you tried adjusting criteria such as rmsd_reject as well? No.

jamesmkrieger commented 1 year ago

Thanks for reporting this. What version were you using previously that did work?

I was using prody 2.2

ok, thanks

Yago52 commented 1 year ago

Also, could you share your code please to help us figure out what's happening?

from pandas import *
from numpy import *
from prody import *
from os import mkdir 
from os.path import isdir 
from os import chdir 

spike_close = parsePDB('6vxx', chain = 'A', subset = 'ca') #structure of reference

chdir('/home/ysilva/ownCloud/mestrado/estruturas para projeto/scripts/data_cov3d/data_csv')
data = read_csv('data_12042023.csv') #data from a protein data bank
chdir('/home/ysilva/ownCloud/mestrado/estruturas para projeto/scripts/data_cov3d')

data2 = data[((data.Domain == 'full') | (data.Domain == 'Spike glycoprotein')) & (data.Virus == 'SARS-CoV-2')] #filter the data
pdbids = list(data2.PDB) #pdb codes to create ensemble

# check and create pasta for pdbs
if not isdir('pdbs_estrutura_spike_all'): 
    mkdir('pdbs_estrutura_spike_all') 

pathPDBFolder('pdbs_estrutura_spike_all') 
pdbs_a = parsePDB(pdbids, chain = 'A', subset = 'ca' ) #check if the .pdb is in the pasta or download .pdb 

pathPDBFolder('')

ensemble_close_a = buildPDBEnsemble(pdbs_a, ref = spike_close, superpose=True, title='close_a') #create ensemble

jamesmkrieger commented 1 year ago

ok, I can reproduce the problem of stalling with the following code

from pandas import *
from numpy import *
from prody import *
from os import mkdir 
from os.path import isdir 
from os import chdir 

spike_close = parsePDB('6vxx', chain='A', subset='ca') #structure of reference

# chdir('/home/ysilva/ownCloud/mestrado/estruturas para projeto/scripts/data_cov3d/data_csv')
# data = read_csv('data_12042023.csv') #data from a protein data bank
# chdir('/home/ysilva/ownCloud/mestrado/estruturas para projeto/scripts/data_cov3d')

# data2 = data[((data.Domain == 'full') | (data.Domain == 'Spike glycoprotein')) & (data.Virus == 'SARS-CoV-2')] #filter the data
# pdbids = list(data2.PDB) #pdb codes to create ensemble

pdbids = ["7zrv", "7xdk", "7xdb", "7wwj", "8aqv"]

# check and create pasta for pdbs
if not isdir('pdbs_estrutura_spike_all'): 
    mkdir('pdbs_estrutura_spike_all') 

pathPDBFolder('pdbs_estrutura_spike_all') 
pdbs_a = parsePDB(pdbids, chain='A', subset='ca' ) #check if the .pdb is in the pasta or download .pdb 

pathPDBFolder('')

ensemble_close_a = buildPDBEnsemble(pdbs_a, ref=spike_close, superpose=True, title='close_a') #create ensemble

It works in ProDy 2.3.1 as well, but not 2.4.0 as you said

Yago52 commented 1 year ago

Ok I will try to downgrade to Prody 2.3.1 with conda

jamesmkrieger commented 1 year ago

Any version before 2.4.0 should work fine. I'll also try to figure out what's happening when I can.

Yago52 commented 1 year ago

Do you know how to downgrade with conda?

jamesmkrieger commented 1 year ago

Just use pip and put pip install prody==2.3.1

I think you use a single = for conda (getting it from conda-forge instead of pypi) but it should be the same anyway.

jamesmkrieger commented 1 year ago

ok, I found the problem and have a fix that seems to work.

The test code above gives the following result:

@> Mapping 8aqvA_ca to the reference... [ 80%] 1s@> WARNING no atommaps were available. Consider adjusting accepting criteria
@> Superposition completed in 0.00 seconds.      
@> Ensemble (4 conformations) were built in 1.64s.
@> WARNING 1 structures cannot be mapped.

with overlap=0.4, it then includes all 5 conformations

jamesmkrieger commented 1 year ago

This is now merged so you can use it if you update ProDy from github. We should be making a release soon too.

prody / ProDy

buildPDBensemble freeze mapping structures #1689