Open cjeong73 opened 6 years ago
In terminal_edit_options module of preprocess.py, I replace mol.segname_info.sequence_to_fasta with mol.chain_info.sequence_to_fasta to get fasta sequence.
print("Current sequences (lowercase indicates residues not in coordinates): ")
for segname in seq_segnames:
# seq = mol.segname_info.sequence_to_fasta(
seq = mol.chain_info.sequence_to_fasta(
segname, missing_lower=True)
print(segname + ':')
print(seq)
Then, non-default mode printed out the segmentation info correctly as below. Need to find out the occurrence of this error in the codes.
Current sequences (lowercase indicates residues not in coordinates): A: mskinvnvenVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMKk C: mskinvnveNVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMKk B: mskinvnveNVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMKk D: mskinvnveNVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMKk
While this gives the correct output it is only for a "print" statement.
Yes, it is only for print statement. Then, I tested what happens if segname_info is replaced with chain_info in all pdbrx codes of dev branch at onsager. This change corrected the output psf and pdb following the sequence info of pdbscan at least for dev branch codes.
That is a possible solution, perhaps a better solution (as segname/segment is the currency involved) is to update the segname_info object. I am very uneasy with finding solutions by replacing bits of code with out understanding the consequences.
See issue report in #117
@cjeong73 What is the status of this post the fixes for #117?
Input pdb file, 4F87.pdb has 4 chains(A,B,C, and D) with missing residues in each chain.
PDBSCAN reported the detailed information of sequences including missing residues. According to the log of PDBSCAN, fasta sequence should be...
A> mskinvnvenVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMKk B> mskinvnveNVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMKk C> mskinvnveNVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMKk D> mskinvnveNVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMKk
Current sequences (lowercase indicates residues not in coordinates): A: mskinvnvenVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMK C: NVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMK B: NVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMK D: NVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMKk
A: VSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMK C: NVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMK B: NVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMK D: NVSGVQGFLFHTDGKESYGYRAFINGVEIGIKDIETVQGFQQIIPSINISKSDVEAIRKAMK