madscatt / zazzie

development branch
GNU General Public License v3.0
2 stars 3 forks source link

Biomt failure PDBScan: Case 5 #92

Open madscatt opened 6 years ago

madscatt commented 6 years ago

Four PDB files failed with the following error:

case 5 KeyError 4 0.3 File "/share/apps/local/anacondaz/lib/python2.7/site-packages/sassie/build/pdbscan/pdbscan/reconcile.py", line 481, in compare_sequence     coor_seq = coor_info.sequence[chain]

One example is 5712.pdb

Reading this BIOMT record (REMARK 350 lines):

REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C, D, E, F, G, H, I,
REMARK 350 AND CHAINS: J, K, L, M

There are a couple of issues.

(1) the code (header_reader.py) line 748 reads content past character 31:

for chain in content[31:].split(','):
   if chain != ' ':
     biomt[bm_no]['subdivs'].append(chain.strip())
12345678901234567890123456789012345678901
REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C, D, E, F, G, H, I,            
REMARK 350                    AND CHAINS: J, K, L, M 

Which of course is correct if you forget about REMARK 350 etc.

Perhaps, parsing this differently to grab a preliminary string using the colon would be better. Such as:

for chain in line.split(":")[1].split(','):

(2) The second problem is the last bit of the first test line. "I," will give a blank space that is read as an empty string even though there is a "if chain != ' '" test in place. For example, running this code snippet

 infile = open("dum.txt").readlines()

 for line in infile:

    try: 
        important = line.split(":")[1]
        for chain in line.split(":")[1].split(','):
            if chain != ' ':
                print 'chain != empty space = ', chain
            else:
                print 'chain == empty space = ', chain
    except:
        print "did not process this line = ", line

on the two lines gives:

chain != empty space = A chain != empty space = B chain != empty space = C chain != empty space = D chain != empty space = E chain != empty space = F chain != empty space = G chain != empty space = H chain != empty space = I chain != empty space =

chain != empty space = J chain != empty space = K chain != empty space = L chain != empty space = M

where the empty space passes through the test.

Using this code snippet:

infile = open("dum.txt").readlines()

for line in infile:

    try: 
        line = line.replace(' ', '')
        for chain in line.split(":")[1].split(','):
            if not chain.isspace():
                print 'chain != empty space = ', chain
            else:
                print 'chain == empty space = ', chain
    except:
        print "did not process this line = ", line

results in this output:

chain != empty space = A chain != empty space = B chain != empty space = C chain != empty space = D chain != empty space = E chain != empty space = F chain != empty space = G chain != empty space = H chain != empty space = I chain == empty space =

chain != empty space = J chain != empty space = K chain != empty space = L chain != empty space = M

where the empty space is now found.

I will test the following new code around lines "748" in header_reader.py

content = content.replace(' ', '')
    for chain in content.split(":")[1].split(','):
         if  not chain.isspace():
                biomt[bm_no]['subdivs'].append(chain.strip())
madscatt commented 6 years ago

The attempted fix leads to this new error:

Traceback (most recent call last):
  File "gui_mimic_pdbscan.py", line 39, in <module>
    scan.main(variables,txtQueue)
  File "/share/apps/local/anacondaz/lib/python2.7/site-packages/sassie/build/pdbscan/pdb_scan.py", line 59, in main
    self.run_scan()
  File "/share/apps/local/anacondaz/lib/python2.7/site-packages/sassie/build/pdbscan/pdb_scan.py", line 121, in run_scan
    mol.copy_biomt_segments()
  File "/share/apps/local/anacondaz/lib/python2.7/site-packages/sassie/build/pdbscan/pdbscan/scanner.py", line 2319, in copy_biomt_segments

KeyError: ''
madscatt commented 6 years ago

Some files do NOT actually have an empty space, just a comma without anything else on the line. Altered the code to read:

                        content = content.split(":")[1]
                        content = content.replace(' ', '')
                        for chain in content.split(','):
                            if not chain.isspace() and len(chain) > 0:
                                biomt[bm_no]['subdivs'].append(chain.strip())
                                logging.debug('biomt[bm_no] ' + 
                                          biomt[bm_no]['subdivs'][-1])
                            else:
                                logging.debug('chain == empty_space = ' + 
                                                chain)

which allows file 7361.pdb to pass (this file is smaller: six chains that allows for a faster test to changes in the code).

Next step would be to test on the remaining failure cases.