Closed yhgon closed 2 years ago
Can you create a simpler example that uses only PDBFixer, without Biopython or PDBTools? It can download the PDB file for you, so you don't need anything else to do that:
fixer = PDBFixer(pdbid='6hqv')
It also can remove the unwanted chains with fixer.removeChains()
so you don't need anything else for that either.
I evaluate it with only PDBFixer.
and I found the reason.
fixer.removeChains cannot seperate chainA. I guess biopython and PDBtools also have problem for these issue.
def download_pdb(pdbid, chain_id='A'):
from openmm.app import PDBFile
from pdbfixer import PDBFixer
fixer = PDBFixer(pdbid=pdbid)
## select first chain
fixer.removeChains(chainIds=['B', 'C', 'D', 'E', 'F', 'G', 'H', 'I' ])
fixer.removeHeterogens(False)
output_pdb = '{}_{}.pdb'.format(pdbid,chain_id)
PDBFile.writeFile(fixer.topology, fixer.positions, open(output_pdb, 'w'))
download_pdb(pdbid='6hqv')
output have below results and ATOM 11471 N GLU B
is problem.
ATOM 11459 CG1 VAL A1516 104.078 112.253 12.457 1.00 0.00 C
ATOM 11460 CG2 VAL A1516 105.582 113.311 10.763 1.00 0.00 C
ATOM 11461 N GLU A1517 105.519 109.079 9.275 1.00 0.00 N
ATOM 11462 CA GLU A1517 105.696 108.512 7.923 1.00 0.00 C
ATOM 11463 C GLU A1517 105.135 107.093 7.863 1.00 0.00 C
ATOM 11464 O GLU A1517 103.927 106.903 7.742 1.00 0.00 O
ATOM 11465 CB GLU A1517 107.175 108.493 7.505 1.00 0.00 C
ATOM 11466 CG GLU A1517 108.157 108.167 8.604 1.00 0.00 C
ATOM 11467 CD GLU A1517 109.597 108.351 8.173 1.00 0.00 C
ATOM 11468 OE1 GLU A1517 110.497 107.939 8.942 1.00 0.00 O
ATOM 11469 OE2 GLU A1517 109.820 108.910 7.070 1.00 0.00 O
TER 11470 GLU A1517
ATOM 11471 N GLU B 1 84.985 174.967 28.676 1.00 0.00 N
ATOM 11472 CA GLU B 1 84.095 174.330 27.668 1.00 0.00 C
ATOM 11473 C GLU B 1 84.611 172.951 27.233 1.00 0.00 C
ATOM 11474 O GLU B 1 83.897 172.195 26.567 1.00 0.00 O
ATOM 11475 CB GLU B 1 83.955 175.248 26.458 1.00 0.00 C
ATOM 11476 CG GLU B 1 82.570 175.251 25.831 1.00 0.00 C
ATOM 11477 CD GLU B 1 81.630 176.243 26.486 1.00 0.00 C
ATOM 11478 OE1 GLU B 1 81.035 177.065 25.760 1.00 0.00 O
ATOM 11479 OE2 GLU B 1 81.492 176.198 27.723 1.00 0.00 O
ATOM 11480 OXT GLU B 1 85.739 172.548 27.540 1.00 0.00 O
TER 11481 GLU B 1
END
when I check original PDB file 6hqv,
TOM 11471 N GLU B 1
, it came from HETATM23437 ~ HETATM23446
HETATM23437 N GLU A1603 84.985 174.967 28.676 1.00 97.46 N
ANISOU23437 N GLU A1603 12736 12114 12179 840 -451 -515 N
HETATM23438 CA GLU A1603 84.095 174.330 27.668 1.00 97.25 C
ANISOU23438 CA GLU A1603 12712 12128 12110 812 -426 -445 C
HETATM23439 C GLU A1603 84.611 172.951 27.233 1.00 98.81 C
ANISOU23439 C GLU A1603 12895 12346 12302 802 -445 -493 C
HETATM23440 O GLU A1603 83.897 172.195 26.567 1.00101.12 O
ANISOU23440 O GLU A1603 13197 12671 12551 775 -433 -457 O
HETATM23441 CB GLU A1603 83.955 175.248 26.458 1.00 98.77 C
ANISOU23441 CB GLU A1603 12875 12337 12313 752 -377 -390 C
HETATM23442 CG GLU A1603 82.570 175.251 25.831 1.00 98.68 C
ANISOU23442 CG GLU A1603 12873 12356 12262 730 -368 -275 C
HETATM23443 CD GLU A1603 81.630 176.243 26.486 1.00 94.26 C
ANISOU23443 CD GLU A1603 12315 11747 11751 772 -357 -210 C
HETATM23444 OE1 GLU A1603 81.035 177.065 25.760 1.00 90.98 O
ANISOU23444 OE1 GLU A1603 11872 11324 11369 742 -345 -112 O
HETATM23445 OE2 GLU A1603 81.492 176.198 27.723 1.00 91.28 O
ANISOU23445 OE2 GLU A1603 11964 11336 11383 828 -359 -258 O
HETATM23446 OXT GLU A1603 85.739 172.548 27.540 1.00 94.34 O
ANISOU23446 OXT GLU A1603 12299 11757 11788 819 -475 -573 O
so when I check the sequence, it show two chain.
0 A 1517 EPTRIAILGKEDIIVDHGIWLNFVAHDLLQTLPSSTYVLITDTNLYTTYVPPFQAVFEAAAPRDVRLLTYAIPPGEYSKSRETKAEIEDWLSHACTRDTVIIALGGGVIGDIGYVAATFRGVRFVQVPTTLLAVDSSIGGKTAIDTPGKNLIGAFWQPRRIYIDLAFLETLPVREFINGAEVIKTAAIWNETEFTALEENAAAILEAVRSKASSPAARLAPIRHILKRIVLGSARVKAEVVSADEREGGLRNLLNFGHSIGHAYEAILAPQVLHGECVAIGVKEAELARYLGVLRPSAVARLTKLIASYDLPTSVHDKRIAKLSAGKECPVDVLLQKAVDKKNEGRKKKIVLLSAIGKTYEKKATVVDDRAIRLVLSPSVRVTPGVPKGLSVTVTPPGSKSISNRALVLAALGEGTTRIHGLLHSDDVQYLAAIEQLHGADFSWEDAGEILVVTGKGGKLQASKEPLYLGNAGTASRFLTSVVALCAPSAVSSTVLTGNARKVRPIGALVDALRANGVGVKYLEKEKSLPVEVDAAGGFAGGVIELAATVSSQYVSSILAAPYAHQPVTLRLVGGKPISQPYIDTIAASFGIKVERSAEDPNTYLIPKGVYKNPPEYVVESDASSATYPLAVAAITGTTCTIPNIGSESLQGDARFAVEVLRPGCAVEQTATSTTVTGPPIGTLKAIPHVDEPTDAFLTAAVLAAVADGTTQITGIANQRVKECNRIAAKDQLAKFGVQCNELEDGIEVIGKPYQELRNPVEGIYCYDDHRVASHSVLSTISPHPVLILERECTAKTWPGWWDILSQFFKVQLDGEEDPTGTDRSIFIVGRGAGKSTAGRWSELLKRPLVDLDAELERREGTIPEIIRGERGWEGFRQAELELLQDVIKNQSKGYIFSCGGGIVETEAARKLLIDYHKNGGPVLLVHRDTDQVVEYLRDKTRPAYSENIREVYERRKPWFYECSNLQYHSPHEDGSEALLQPPADFARFVKLIAGQSTHLEDVRAKKHSFFVSLTVPNVADALDIIPRVVVGSDAVELRVDLLESYEPEFVARQVALLRAAAQVPIVYTVRTQSQGGKFPDEDYDLALRLYQTGLRSGVEYLDLETPDHILQAVTDAKGFTSIIASHHDPQCKLSWKSGSWIPFYNKALQYGDVIKLVGVAREADNFALTNFKAKLAAHDNKPIALNGTAGKLSRVLNGFLTPVSHPALPSKAAPGQLSATEIRQALSLIGEIEPKSFYLFGKPISASRSPALHNTLFYKTGLPHHYSRFETDEASKALESLIRSPDFGGASVTIPLKLDIPLLDSATDAARTIGAVNTIIPQTRDGSTTTLVGDNTDWRGVHALLHSSGSGSVVQRTAAPRGAAVVGSGGTARAAIYALHDLGFAPIWIVARSEERVAELVRGFDGYDLRRTSPHQGKDNPSVVISTIPATQPIDPSREVIVEVLKHGHPSAEGKVLLEAYQPPRTPLTLAEDQGWRTVGGLEVLAAQGWYQFQLWTGITPLYEEARAAVGEDSVE
1 B 1 E
I rerun load PDB file and select chain A, it eliminate GLU in chain B. however, It cannot detect missing residues after run.
I implement it with redundant script. it make output what I expected.
download 1818.9ms
select chain 190.8ms
detect missing residues 1.9ms
{(0, 0): ['MET', 'ALA', 'THR', 'ALA', 'ASN', 'VAL', 'ALA', 'GLY', 'ALA', 'GLY', 'GLY', 'SER', 'GLY', 'SER'],
(0, 839): ['LYS', 'ARG', 'THR', 'THR', 'GLN', 'SER', 'THR', 'GLN', 'GLN', 'VAL', 'ARG', 'LYS'],
(0, 1555): ['LEU', 'GLU', 'HIS', 'HIS', 'HIS', 'HIS', 'HIS', 'HIS']}
remove heterogens 580.8ms
add Missing Atoms 97680.9ms
save tmp file and load 791.8ms
select chain 323.3ms
remove small molecules 0.0ms
save file 128.2ms
0 A 1589 MATANVAGAGGSGSEPTRIAILGKEDIIVDHGIWLNFVAHDLLQTLPSSTYVLITDTNLYTTYVPPFQAVFEAAAPRDVRLLTYAIPPGEYSKSRETKAEIEDWMLSHACTRDTVIIALGGGVIGDMIGYVAATFMRGVRFVQVPTTLLAMVDSSIGGKTAIDTPMGKNLIGAFWQPRRIYIDLAFLETLPVREFINGMAEVIKTAAIWNETEFTALEENAAAILEAVRSKASSPAARLAPIRHILKRIVLGSARVKAEVVSADEREGGLRNLLNFGHSIGHAYEAILAPQVLHGECVAIGMVKEAELARYLGVLRPSAVARLTKLIASYDLPTSVHDKRIAKLSAGKECPVDVLLQKMAVDKKNEGRKKKIVLLSAIGKTYEKKATVVDDRAIRLVLSPSVRVTPGVPKGLSVTVTPPGSKSISNRALVLAALGEGTTRIHGLLHSDDVQYMLAAIEQLHGADFSWEDAGEILVVTGKGGKLQASKEPLYLGNAGTASRFLTSVVALCAPSAVSSTVLTGNARMKVRPIGALVDALRANGVGVKYLEKEKSLPVEVDAAGGFAGGVIELAATVSSQYVSSILMAAPYAHQPVTLRLVGGKPISQPYIDMTIAMMASFGIKVERSAEDPNTYLIPKGVYKNPPEYVVESDASSATYPLAVAAITGTTCTIPNIGSESLQGDARFAVEVLRPMGCAVEQTATSTTVTGPPIGTLKAIPHVDMEPMTDAFLTAAVLAAVADGTTQITGIANQRVKECNRIAAMKDQLAKFGVQCNELEDGIEVIGKPYQELRNPVEGIYCYDDHRVAMSHSVLSTISPHPVLILERECTAKTWPGWWDILSQFFKVQLDGEEDPTKRTTQSTQQVRKGTDRSIFIVGMRGAGKSTAGRWMSELLKRPLVDLDAELERREGMTIPEIIRGERGWEGFRQAELELLQDVIKNQSKGYIFSCGGGIVETEAARKLLIDYHKNGGPVLLVHRDTDQVVEYLMRDKTRPAYSENIREVYERRKPWFYECSNLQYHSPHEDGSEALLQPPADFARFVKLIAGQSTHLEDVRAKKHSFFVSLTVPNVADALDIIPRVVVGSDAVELRVDLLESYEPEFVARQVALLRAAAQVPIVYTVRTQSQGGKFPDEDYDLALRLYQTGLRSGVEYLDLEMTMPDHILQAVTDAKGFTSIIASHHDPQCKLSWKSGSWIPFYNKALQYGDVIKLVGVAREMADNFALTNFKAKMLAAHDNKPMIALNMGTAGKLSRVLNGFLTPVSHPALPSKAAPGQLSATEIRQALSLIGEIEPKSFYLFGKPISASRSPALHNTLFYKTGLPHHYSRFETDEASKALESLIRSPDFGGASVTIPLKLDIMPLLDSATDAARTIGAVNTIIPQTRDGSTTTLVGDNTDWRGMVHALLHSSGSGSVVQRTAAPRGAAMVVGSGGTARAAIYALHDLGFAPIWIVARSEERVAELVRGFDGYDLRRMTSPHQGKDNMPSVVISTIPATQPIDPSMREVIVEVLKHGHPSAEGKVLLEMAYQPPRTPLMTLAEDQGWRTVGGLEVLAAQGWYQFQLWTGITPLYEEARAAVMGEDSVELEHHHHHH
def fixing_pdb(pdbid, chain_id='A'):
from openmm.app import PDBFile
from pdbfixer import PDBFixer
import time
tic = time.time()
fixer = PDBFixer(pdbid=pdbid)
toc = time.time()
dur = toc-tic
print("download {:4.1f}ms".format(dur*1000))
## select first chain
tic = time.time()
fixer.removeChains(chainIds=['B', 'C', 'D', 'E', 'F', 'G', 'H', 'I' ])
toc = time.time()
dur = toc-tic
print("select chain {:4.1f}ms".format(dur*1000))
## detect missing residues
tic = time.time()
fixer.findMissingResidues()
toc = time.time()
dur = toc-tic
print("detect missing residues {:4.1f}ms".format(dur*1000))
print(fixer.missingResidues )
tic = time.time()
fixer.findNonstandardResidues()
fixer.replaceNonstandardResidues()
fixer.removeHeterogens(False)
toc = time.time()
dur = toc-tic
print("remove heterogens {:4.1f}ms".format(dur*1000))
tic = time.time()
fixer.findMissingAtoms()
fixer.addMissingAtoms()
toc = time.time()
dur = toc-tic
print("add Missing Atoms {:4.1f}ms".format(dur*1000))
tic = time.time()
PDBFile.writeFile(fixer.topology, fixer.positions, open('tmp.pdb', 'w'))
fixer = PDBFixer(filename='tmp.pdb')
toc = time.time()
dur = toc-tic
print("save tmp file and load {:4.1f}ms".format(dur*1000))
## select first chain
tic = time.time()
fixer.removeChains(chainIds=['B', 'C', 'D', 'E', 'F', 'G', 'H', 'I' ])
toc = time.time()
dur = toc-tic
print("select chain {:4.1f}ms".format(dur*1000))
## remove heterogens
tic = time.time()
#fixer.removeHeterogens(False)
toc = time.time()
dur = toc-tic
print("remove small molecules {:4.1f}ms".format(dur*1000))
tic = time.time()
output_pdb = '{}_{}.pdb'.format(pdbid,chain_id)
PDBFile.writeFile(fixer.topology, fixer.positions, open(output_pdb, 'w'))
toc = time.time()
dur = toc-tic
print("save file {:4.1f}ms".format(dur*1000))
I evaluate OpenMM simulation and the output works well.
I evaluate one chain from complex assembly pdb file(6HQV).
when I select chain A with biopython, it detect discountinous but it cannot detect missing residue with PDBFixer. and it pass on energy minimizing step but cause the
NaN position error
in OpenMM simulation iteration.However, when I select chain A with PDBTools, it detect lots of missing residues with PDB Fixer, but it also fail on energy minimizing step with error
ValueError: No template found for residue 1590 (GLU). The set of atoms is similar to DC5, but it is missing 10 atoms.
How could I fix this issue? thanks
below is detail log. custom script is also attached in the bottom.
step2-A3. try to fix missing residue but it do not detect it. empty residue lists
step2-A4. when I try to run simulation, energy minimizing step pass. 1 iteration was success however, it failed in second iteration with
ValueError: Particle position is NaN
. detail log is below :error log NaN postion
after fix, I could see the added residues as reported.
ValueError: No template found for residue 1590 (GLU). The set of atoms is similar to DC5, but it is missing 10 atoms.
it's odd error becuase, it has 1589 residues and end is not GLU(E) but HIS(H) in the fixed PDB. . detail error below :
when I check the whole sequence, fixed pdb have two chain. second chain include only residue E. it seems that cause the error.
it's google drive link for all PDB_files
-------------------- below is utility script what I test --------------------