openmm / pdbfixer

PDBFixer fixes problems in PDB files
Other
461 stars 115 forks source link

fixer.findMissingResidues() could not find the missing residues ! #305

Open Jalil-Mahdizadeh opened 2 weeks ago

Jalil-Mahdizadeh commented 2 weeks ago

Hi, I got a weird problem where fixer.findMissingResidues() could not find the missing residues, but the missing sequences are present in the SEQRES. The PDB ID is 3LJ0 where there are two missing loops (not too long) in each chain (homodimer), The first missing loop starts from K759 to E771 (KNVSDENLKLQKE) and the second one starts from N845 to G851 (NLNNPSG).

https://files.rcsb.org/view/3LJ0.pdb

Does anyone have a solution for that?

peastman commented 2 weeks ago

The ATOM and SEQRES records don't match each other. It expects to be able to align them, with the ATOM records containing a subset of the residues present in SEQRES, and it offers to add the missing ones. But in this case the ATOM records have more residues, not less. For chain A they describe a chain of length 451 (residues 665 through 1115), but according to the SEQRES records there should only be 434 residues in the complete chain.

Since it can't align them, it doesn't know what to add.

Jalil-Mahdizadeh commented 2 weeks ago

Thank you Peter. I found the issue. There is a 25 length missing loop (868-893) with no corresponding sequence in the SEQRES :( Not sure how to handle all these inconsistency in PDB. Is there any possibility to introduce an external sequence (FASTA) which must be more reliable than SEQRES?

peastman commented 2 weeks ago

Yeah, that's the reason that PDBFixer exists: it's really common for PDB files to be broken. :( Unfortunately, they sometimes are broken in ways it can't fix automatically.

If findMissingResidues() isn't able to identify which residues to add, you can tell it by setting the missingResidues field directly. See the manual for details.

Jalil-Mahdizadeh commented 2 weeks ago

Thanks. Honestly, I'm developping a fully automatized protein preparation pipeline therefore, it's not technically possible to set the missing residues manually

peastman commented 2 weeks ago

I'd be very cautious about trying to fully automate your pipeline. Even when PDBFixer is nominally able to "fix" all problems on its own, you should always inspect the results. Protein structures can be messed up in a whole lot of ways. There's no substitute for an expert human looking it over to check for problems.