haddocking / pdb-tools

A dependency-free cross-platform swiss army knife for PDB files.
https://haddocking.github.io/pdb-tools/
Apache License 2.0
372 stars 113 forks source link

Handling CONECT entries #72

Closed joaomcteixeira closed 3 years ago

joaomcteixeira commented 3 years ago

A user suggested:

Remove obviously incorrect CONECT records.

We have discussed the possibility to remove all CONECT lines directly because assessing this requires knowing the chemical structure of the molecule, as you have said in our discussions. But also we can keep the CONECT related to S-S bonds easily in case of ATOM entries, on the other hand.

For HETATM entries, there are to main problems identified, from https://www.wwpdb.org/documentation/file-format-content/format33/sect10.html:

Known Problems

CONECT records involving atoms for which the coordinates are not present in the entry (e.g., symmetry-generated) are not given.

CONECT records involving atoms for which the coordinates are missing due to disorder, are also not provided.

Some questions

@JoaoRodrigues @amjjbonvin @brianjimenez @mtrellet

  1. Should we remove all CONECT entries?
  2. Should we remove only those belonging to absent atom serials?
  3. Should we keep at least the S-S bonds for ATOM entries?
  4. should pdb_tidy handle this or a new tool instead?
brianjimenez commented 3 years ago

In my opinion, removing all CONECT statements in pdb_tidy would be enough.

JoaoRodrigues commented 3 years ago

CONECT statements need a topology. Some of the tools do some legwork to correct CONECT statements when the atom serial number change, but that's pretty much the only time we look at them. We could do something similar somewhere else (maybe pdb_tidy, but it's growing too much for my liking...) and remove CONECT records that match unknown serial numbers. Alternatively, going a bit more ahead, we could use simple chemistry and establish that a carbon cannot have more than 4 bonds, so we could issue warnings on those statements.

All things considered, my vote is to ignore CONECT statements altogether. Garbage in, garbage out.

amjjbonvin commented 3 years ago

I support that - CONECT statements are too tricky.

If people need them then they can use for example PyMol to generate them - but even then there is probably no warranty they are correct.

joaomcteixeira commented 3 years ago

Following your comments, I conclude the solution is the remove all CONNECT within pdb_tidy. :+1: