Closed joaomcteixeira closed 3 years ago
We thought there could be four new tools to directly select for protein, nucleic, and carbohydrates, regardless of the chain where they sit. We could use the residue name identified for HADDOCK https://wenmr.science.uu.nl/haddock2.4/library to perform the selection. For ligands, we could use the information in the PDB data bank; I already have files for that, and we could also do by negation.
This is a bit tricky as both ligands and glycans are labelled as HETATM. So you would really have to select based on residue names (i.e. a pre-defined list).
Yes. That is exactly what we thought. Because piping these logics isn't straightforward with the current methods we thought on implementing there dedicated scripts based exactly on residue names for protein, nucleic, carbo, and ligands. Maybe carbo and ligands are tricky as a carbo can be a ligand. But for the rest it could be a good solution.
Also tricky are for example the modified amino acids, e.g. MSE (selenomethionine) - should not be filtered out as a ligand. Those scripts might get very much haddock-specific …
Yes. That is exactly what we thought. Because piping these logics isn't straightforward with the current methods we thought on implementing there dedicated scripts based exactly on residue names for protein, nucleic, carbo, and ligands. Maybe carbo and ligands are tricky as a carbo can be a ligand. But for the rest it could be a good solution.
I have the same concern as Alex. Residue names are very much defined by the forcefield/software. For instance, you can have CYS
, CYH
, depending if they are bonded or not. Ligands are even worse. We could use the ligandexpo table to screen for ligands, but that's a fairly large table (easily a few times larger than the entire pdb-tools codebase) and it's also not foolproof. The simplest would be a selprotein
tool but even that will have a lot of corner cases where it won't work. We could have one that satisfies say 99.99% of the use cases but that's pretty much the only one where we can have such a good success rate without a lot of work.
Such scripts could be made HADDOCK-specific eventually and put in the haddock-tools repo
Perfect. I will look forward to it.
Talking with @brianjimenez
We thought there could be four new tools to directly select for protein, nucleic, and carbohydrates, regardless of the chain where they sit. We could use the residue name identified for HADDOCK to perform the selection. For ligands, we could use the information in the PDB data bank; I already have files for that, and we could also do by negation.
These new tools could be named:
pdb_selprotein
pdb_selnucleic
pdb_selcarbo
pdb_selligands
Likewise, we could have the
del
version.I am very mindful to the one-script-one-job philosophy, yet I think these scripts could enhance user experience without breaking the original philosophy.
What are your thoughts?