Open andrewsb8 opened 3 years ago
Another example, adds TER
before the start of the PDB.
python pdbtools/pdb_addter.py -4:: tests/data/dummy.pdb
REMARK CHAIN C DOES NOT HAVE ELEMENTS FOR LAST RESIDUE
REMARK CHAIN C CONTAINS RESIDUES OUT OF ORDER
REMARK CHAIN D IS NUCLEIC ACID AND LACKS A TER STATEMENT
TER
ATOM 1 N ARG B 4 36.898 42.175 -2.688 1.00 0.00 N
ATOM 2 H ARG B 4 37.673 41.570 -2.916 1.00 0.00 H
ATOM 3 H2 ARG B 4 35.929 41.470 -2.758 1.00 0.00 H
ATOM 3 H3 ARG B 4 37.099 42.392 -1.524 1.00 0.00 H
ATOM 3 CA ARG B 4 37.080 43.455 -3.421 1.00 0.00 C
ATOM 3 HA ARG B 4 38.102 43.960 -3.065 1.00 0.00 H
Another comment,
The PDB
you propose as a test case has about 47k lines. That is too big for the scope of this project. Please look for a minimum case scenario needed to test the new tool. If this PR were accepted, the 47k lines from your PDB would collapse the whole statistics of this repository and that wouldn't be fair. For example, dummy PDB has about 200 lines.
Cheers!
What about pdb_add_manual_ter? A bit longer, but at least more specific.
That could be a proper name. As @JoaoRodrigues said, addter
might be confusing because of pdb_tidy
which already does correct TER
addition.
Thanks for the feedback!
The name change makes sense. Having ability to manually place TER records will be helpful ultimately given the other scripts' purposes (based on my understanding). Ex: chainbows requires TER records, tidy requires Chain identifiers. Some tools don't always give Chain IDs (CHARMM-GUI, Gromacs if Chain IDs aren't provided and no TER records already in place, VMD in some cases, etc). So this tool could be complicated but would help complete the suite as some of the tools need one or the other to already exist.
I have some questions and comments as well:
Otherwise, I will get to those changes, write tests for a smaller pdb file and dummy.pdb (based on your feedback to the above suggestion), and make the names appropriate. Thanks for your time.
Brian
Hi,
you can stay on the same branch, no need to do a new branch,
I get an error using chainbows as well (on a normal pdb where I did not use pdb_addter.py first). I don't have the error message, but the last commit message for that script is "half way done". So I assumed that was not complete yet and didn't submit an issue.
i don't understand, can you rephrase? Can it be that chainbow
has a bug?
For tests with nonsequential residue numbers, it might be best to throw an error when that's the case here. Redirect the user to pdb_reres to avoid undesired results and potential confusion.
pdb-tools should not raise errors (I think). these are very simple tools that should do the job when the PDB follows (more or less) the right formatting rules. if the PDB is "crazy" or has "errors incompatible with the tool", I am not sure if it should be the responsibility of pdbtools
to warn about that. I think we don't do that in any other tool. @JoaoRodrigues ?
Yes please, use the smallest possible reproducible case for PDB tests.
Cheers!
I'll have to get back to you about pdb_chainbows
error as I cannot check today.
If not throwing an error, the function could mimic that of pdb_delres
which asserts that the residues are deleted regardless of sequence number. pdb_add_manual_ter
is written similarly so having it behave this way would be consistent with previous practices in the repository. Based on the test with dummy.pdb, there would still have to be changes made which shouldn't be too difficult (i.e. not considering the value of the residue number in the pdb file when determining if in the residue range to add a TER).
Sounds good so far. Give it a try on the review comments and your considerations. We will review it again after that. Best, and we are very happy to have you interacting with pdb-tools
:relaxed:
Hi all,
I've made some fairly large changes to the branch pdb_addter. I'll outline some of the largest here:
is_integer
is changed to return_integer,
which now also handles the error message for checking that inputs are correct as mentioned in a comment above. Also saves some lines in check_input
.make_TER
from pdb_tidy
to add full TER entries.Let me know what you think or if you run into any issues!
Hey all,
I know that it's end of term and holiday season. Just wanted to follow up because I had actually forgotten about this until a minute ago :sweat_smile:. Hope the end of the term goes well for you both and that you enjoy your holiday break!
Brian
Hi, @andrewsb8 sorry for the late reply. These have been busy days on all fronts. My apologies. I will review this PR today/tomorrow and let you know :relaxed: stay tuned. :+1: Thanks very much for all your efforts so far :muscle:
Happy new year! Thanks for the feedback. Have a couple of questions based on it.
add_manual_ter -4:7:1 dummy.pdb
, I would run into a potential issue because the residue numbering is 4,6,7,1,2,3,5,2. So, I would get a TER
entry after the fifth residue but that feels like weird behavior sinces it's not next to 4, 6, and 7 and out of order. Similarly, the option -1:3
would produce TER
entries after both residues identified as 2
in that pdb. The latter case could be particularly relevant if you are "combining" pdbs of two or more structures for whatever purpose as it would be likely that residue numbers would repeat (and would be a great use case in tandem with pdb_reres
). Both methods definitely have their downsides. What are your thoughts?ATOM 51 HB3 ALA X 5 ...... TER 52 ALA X 5 ...... ATOM 52 N ALA X 6
So the 52
is duplicated. I can either leave that number out (so only ATOM, HETATM, etc get numbers) or work to make sure the entry numbers are incremented for the number of added TER
entries preceding it. What do you all think? Edited to add: This is actually an interesting issue for dummy.pdb. As the method generating the TER
entry increments the atom number and there are many duplicate atom numbers already. An example from dummy.pdb is:
ATOM 3 O ARG B 4 ..... TER 4 ARG B 4 ...... ATOM 3 N GLU B 6
There the second entry is changed but everything after the TER
entry does not follow it. Let me know what you think! Thanks again for looking over my work and hope you enjoyed the holiday break!
Brian
Hi Brian, thanks for your cheerful words. We had great holidays, we hope you too :-)
All the points you are rising are definitively very tricky and I feel they are getting out of the PDB standard format because of the added complexity. The point you raise for 1) is a good one. Maybe it is best to leave it as "per residue" instead of the actual residue numbers of what you want. Sometimes I think what we if the tool should be add ters at specific residue/chains, for example -A:100
. But the question raises: what is the result of passing pdb_tidy
after pdb_add_manual_ter
? I am starting to have concerns if tools do things outside the PDB format. Maybe what you were looking for is some addition to pdb_tidy
? Or some pdb_split*
combined with pdb_chain
and pdb_tidy
that maybe we don't have?
Regarding 3), pdb_tidy
corrects for atom numbers. I think this tool should also.
I was rereading again the whole conversation here, definitively @JoaoRodrigues raise a good point. Could you point us to a real-world example (PDBID) and would be your perfect outcome so we can investigate further?
Large system pdb files (which I uploaded as a test file for this script, 692 residues) which require multiple TER entries (potentially at different intervals) prior to simulation can be made much more easily with a script.
Could so send us again these examples? (transfer.sh)
Cheers,
I will make it so the atom numbers are fixed, no problem!
As for examples of what I'm looking for, I'm rarely working with a single pdb id and sometimes not one at all. I have three examples of some current projects with large systems:
Before finding this project, I've been modifying these files largely manually which has been very time intensive.
Being able to organize these pdb files via some combination of pdb_reres
, pdb_merge
and/or pdb_tidy
has been greatly useful. TER
entries are particularly important in many simulation use cases for simulation of these systems. Each use case above requires a different number of TER
entries at different intervals (3 in first case, 161 in second, and 7 in the third) and chainbows won't help without preexisting TER
entries in most cases listed above. This seemed to be the only step I had to still performed manually when creating systems for simulation after beginning to use this package and motivated me to suggest the addition.
I have updated pdb_add_manual_ter
according to the suggestions, namely:
pdb_add_manual_ter
) Let me know what you think of my changes or of my prior comment.
Best,
Brian
Hi @andrewsb8 , Thanks so much for your clear explanatory messages. I will check the examples you gave us and your code, and let you know. Cheers,
Hi,
Wrote a script to add TER lines in user defined ways similar to pdb_delres.py. This would be a good functionality for multiple reasons:
I include pdbtools/pdb_addter.py, tests/test_pdb_addter.py, and tests/data/addter_test.pdb where all 13 unit tests in test_pdb_addter.py pass.
Let me know what you think,
Brian