Closed lohedges closed 3 years ago
I've been contacted again by the user who is experiencing this issue. Any input would be much appreciated.
This may be a silly questions, but this isn't just tleap adding two atoms on each of the chains?
I think you're correct in stating that tLEaP is capping the chains. The issue is that the single molecule two-chain system has been converted into a two molecule system, where each molecule is one of the chains from the original molecule, which makes it incompatible with the original molecule that was passed in.
Does anyone have any input on this? I've been contacted by the user multiple times and I'm still not able to provide them with an answer.
Although I'm not sure that the original input is properly prepped, i.e. hydrogens missing, the main problem still holds. How do we go from a two-chain single molecule representation in a PDB file to a no chain two-molecule representation by tLEaP. This breaks our assumption of the original molecular representation being the true one, i.e. what's generated by tLEaP is incompatible with it (not just in terms of atom names, but topology).
Hi Lester,
Essentially the problem is that we assume that the input contains one molecule but it contains actually two molecules. It is difficult to rely on chain identifiers as a lot of different molecules also have the same chain label (often X for solvent etc...) and this is not done consistently in pdb files.
A general solution could inspect the connectivity of every atom to figure out in which molecule they belong but that would depend on being able to correctly infer connectivity from the information available.
A fix for this use case may be to get the user to load each chain separately and combining parameterised systems later.
On Fri, Sep 6, 2019 at 10:40 AM Lester Hedges notifications@github.com<mailto:notifications@github.com> wrote:
Does anyone have any input on this? I've been contacted by the user multiple times and I'm still not able to provide them with an answer.
Although I'm not sure that the original input is properly prepped, i.e. hydrogens missing, the main problem still holds. How do we go from a two-chain single molecule representation in a PDB file to a no chain two-molecule representation by tLEaP. This breaks our assumption of the original molecular representation being the true one, i.e. what's generated by tLEaP is incompatible with it (not just in terms of atom names, but topology).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/michellab/BioSimSpace/issues/93?email_source=notifications&email_token=ACZN3ZC7H2D4JMU3MIV7AILQIIQRPA5CNFSM4H5DQGM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6CKQEY#issuecomment-528787475, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACZN3ZF47DL7XCVAEBNNABTQIIQRPANCNFSM4H5DQGMQ.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Consider the following protein. This has a single molecule with 13932 atoms and two chains. The BioSimSpace PDB parser can read it just fine:
Now let's try to parameterise the molecule:
How do we preserve the chain information? I've looked at the code for
Sire.IO.AmberPrm
and I only see mention of treechains. Does the AMBER format have a concept of PDB style chains in its topology? (I've found some discussion here regarding mapping back to a PDB file from an AMBER trajectory.)Unfortunately it's not as simple as stripping the TER records from the original PDB file, since the parser can detect separate chains by name too. I could probably modify the code to by able to run in a "no-chain" mode if needed, where it ignores the chain identifiers. I've tried doing this manually, but tLEaP still breaks the parameterised system into multiple molecules (albeit with different atom numbers) so it's obviously accounting for the missing information somehow with its template matching.
Instead I could try to modify the algorithm that matches atoms from the tLEaP generated output to those in the original molecule. However, this expects that you are comparing two molecules, rather than a molecule (the original) and a system of molecules (what tLEaP is producing). Would this even work when we are trying to map across properties that assume a single molecule, e.g. bonds, angles, etc. I'm not sure that these can easily be split, or recombined if we need to write back to a different format.