Suggestion: Ignore unmatched chains or chains (completely made up from) unknown residue identifiers for modeling

benjbuch commented 3 months ago

I really like the IDPConformerGenerator to explore possible protein conformations. Some of my PDB structures contain DNA which I remove beforehand from the pdb file in order to start the building process. I don't know how hard it would be to implement, but if these atoms could simply "exist" in 3D space in the pdb when looking for clashes and not trying to be aligned or modeled to the fasta file, that might broaden the applicability of this tool.

One way, I imagine that this could work is that if the fasta header names match the chain names in the pdb file (or be it some identifier like chainA), these will be matched for modeling. In this case, what is not mentioned in the fasta headers will not be modeled, but can still exist in the pdb file. Also, if there were two protein isoforms in the same complex that differ, for example, in their unstructured C-terminal tail, this would be a way to specify which chain to model where. (For backward compatibility: If there were no chain->header matches, then the search algorithm that is already implemented can take over.)

menoliu commented 3 months ago

Hi @benjbuch, so as of v0.7.3 IDPConformerGenerator can technically recognise any heavy atoms and common ions, including DNA. I have modelled systems of multi-chain proteins including DNA and ions in the template but by using IDPConformerGenerator as a Python library instead of directly through the client.

The workflow is as follows for this scripting technique:

Generate backbones (or IDRs with sidechains) of IDRs you would like to model
Align them to the terminus for C- and N-IDR cases (you would need to manually go through the L-IDR case with next_seeker protocol)
During step 2 you would clash-check against the template including your DNA/heteroatoms/other protein chains
If you had side-chains in step 1, I would change the clash_count parameters to be tigher, i.e. accepting num_clashes=10
Use the psurgeon function or customise it to your requirements to perform final stitching protocol

I have attached a sample script I used for the multi-chain case which included ions and DNA: Multichain_CIDR_NIDR.txt

Hope this helps for now!

menoliu commented 3 months ago

Since I am working on #269, I think I can push another update that adds a flag for unique atoms (E.g. --not-only-protein) and have IDPConfGen generate a temporary template without DNA/ions/ligands to get IDR alignment coordinates but keep the original one for clash-checking and stitching purposes.

benjbuch commented 3 months ago

Very helpful, I'll try it out as soon as possible! Many thanks!

julie-forman-kay-lab / IDPConformerGenerator

Suggestion: Ignore unmatched chains or chains (completely made up from) unknown residue identifiers for modeling #270