Closed joaomcteixeira closed 4 years ago
I favor the option of a different tool, seems more in the line of the tools philosophy.
I do agree with you and @brianjimenez, a distinct tool would fit our philosophy better.
I don't understand the question. The user probably means pdb_chain
assigns chain IDs to an entire molecule. If so, that's exactly the point. I understand it's a pain when you have no chains but TER records. I see a few different solutions to this problem:
pdb_splitTER
to split a file solely by TER
records and then let the user fix the chain IDs as they wish with pdb_chain
. Note the capital TER not to have a tool with the ambiguous name pdb_splitter
. pdb_chainbow
tool (I think I have one somewhere) that assigns chains whenever it finds a TER record. In addition, we could have a pdb_mkter
to create TER records based on atom distances (on every chain break).What would solve most use cases?
PS. Emoji explosion :)
I don't understand the question. The user probably means pdb_chain assigns chain IDs to an entire molecule. If so, that's exactly the point. I understand it's a pain when you have no chains but TER records. I see a few different solutions to this problem:
Create a pdb_splitTER to split a file solely by TER records and then let the user fix the chain IDs as they wish with pdb_chain. Note the capital TER not to have a tool with the ambiguous name pdb_splitter. hmmm… no convinced about this. If there are TER statements within the same chain it is for a good reason.
Create a pdb_chainbow tool (I think I have one somewhere) that assigns chains whenever it finds a TER record. In addition, we could have a pdb_mkter to create TER records based on atom distances (on every chain break). That’s dangerous for the reason above. Not in favour of it.
Follow @amjjbonvin and @JoaoRodrigues comments,
I believe I do recall seeing PDBs with TER
lines on backbone breaks within the same chain - so TER
s would separate segments within chains rather than chains (just my memory, I don't recall a specific example). However, using the TER to separate segments is against the TER PDB specifications.
So, despite having an automatic chain adder on every TER may be dangerous, on the other side, having a pdb_splitTER
could be safe, taking into consideration the user knows what s/he is doing. I find it useful. Good documentation on that needs to be provided in CAPS for none to get confused that pdb_splitTER
is not pdb_splitchain
.
I don't think it is appropriate a tool that adds TER
s based on distance restraints. All pdb-tools
functionalities are based just on formatting issues, adding one that calculates stuff goes slightly off the core design. Also, there is the burden of heavy calculations without falling outside the STD LIB. But this is me playing on the conservative side. :wink:
addition: I will wait for a consensus before creating a PR for this.
There's no perfect solution here :) We cannot distinguish between a broken chain and two separate molecules. The 'truth' is that TER records should only be present at the end of protein/nucleic acid chains:
- Every chain of ATOM/HETATM records presented on SEQRES records is terminated with a TER record.
- The TER records occur in the coordinate section of the entry, and indicate the last residue presented for each polypeptide and/or nucleic acid chain for which there are determined coordinates. For proteins, the residue defined on the TER record is the carboxy-terminal residue; for nucleic acids it is the 3'-terminal residue.
What about a pdb_addter
tool that takes a -break
option that adds TER statements at chain ID changes and, optionally (with -break), at gaps as well? This is not "against" the rules, we already do some simple calculations in pdb_gap
:-)
We could then add the pdb_chainbows
tool and basically rely on the user to know what they are doing. To be honest, I only found this scenario once in the real world - no chains but TERs - when converting very complex systems for simulations. I would assume the use cases would be similarly rare..
It's been a while since I last played with TER records of a PDB file but I do like the idea of @JoaoRodrigues with the pdb_addter
and pdb_chainbows
tools.
With proper documentation, this should tackle most of the issues we've raised here. And about the potential philosophy break, as it has been said above, we already do something very similar with pdb_gap
.
Okay, I will try to address this in the following days. :+1:
Looking at this today, I realized that we might just need to add a flag to pdb_tidy
to produce proper PDB files (no TER within chains) and then add the pdb_chainbows
to rename chains by TER statements. That's the minimal set of changes to have this functionality.
Dear developers,
A user has requested that sequential chain IDs are added on PDBs that have
TER
statements delimiting chains but lack chain IDs. My question is on which tool should this be implemented. Initially, I thought aboutpdb_tidy
but now I think that would grant too much power totidy
. Should a new one be created,pdb_completechains
?@JoaoRodrigues @brianjimenez @amjjbonvin @mtrellet