haddocking / pdb-tools

A dependency-free cross-platform swiss army knife for PDB files.
https://haddocking.github.io/pdb-tools/
Apache License 2.0
369 stars 112 forks source link

Help with badly formatted PDB file #134

Closed LilySnow closed 2 years ago

LilySnow commented 2 years ago

Describe the bug Running pdb_selchain.py on a pdb file with REMARK lines will report the following error:

File ".../pdb-tools/pdbtools/pdb_selchain.py", line 138, in run if line[21] not in chain_set: IndexError: string index out of range

To fix It can be fixed by this:

for line in fhandle:
          if line.startswith(records):
             if len(line)<22:
                continue
            elif line[21] not in chain_set:
                  continue
          yield line
joaomcteixeira commented 2 years ago

Hi @LilySnow

Your solution would work, but RECORD lines should have 80 chars on valid PDB files. It seems that the PDB file has a very short RECORD line; less than 22 chars. So we will not implement that correction, because it is not a correction, it is opening a door to accept incorrect PDB files instead.

Can you try to run pdb_tidy your.pdb | pdb_selchain to see if it works?

Otherwise, run pdb_validate to see which errors there are with your PDB.

If your pdb is not private, we are happy to help you out sort the problem if you can share the PDB with us.

Cheers,

edit: Note that REMARK lines are given back unchanged by pdb_selchain.

JoaoRodrigues commented 2 years ago

Hi Li,

We don't parse REMARK lines, so the problem must be coming from another record.

As João suggested, try running pdb_tidy first to normalize line lengths. It's a fairly common problem with PDB files, often with TER records.

joaomcteixeira commented 2 years ago

I think this is solved. @LilySnow feel free to reopen the issue if you need help. :+1: