julie-forman-kay-lab / IDPConformerGenerator

Build conformational representations of Intrinsically Disordered Proteins and Regions by a guided sampling of the protein torsion space
https://idpconformergenerator.readthedocs.io/
Apache License 2.0
19 stars 6 forks source link

Ignore certain PDBs during the building process #233

Open joaomcteixeira opened 1 year ago

joaomcteixeira commented 1 year ago

Some users are asking how not to consider certain PDBs' torsions when creating conformers. This is relevant when building proteins (or parts of proteins) already deposited in the PDB. It is a very good question we have discussed many times. There are two ways to achieve this:

  1. Unlist those PDBs from the initial PDB culled list when creating the database, or
  2. Remove those entries from the JSON final database file.

We need two things:

For example, to ignore PDBs 1ABC and 2DEF from the torsion sampling:

idpconfgen build [...] --ignore 1ABC 2DEF
menoliu commented 1 year ago

Good idea! I think the search sub-client would be handy to help solve this issue.

joaomcteixeira commented 1 year ago

I think search is a very broad word. Why another subclient? To edit the database? It can be. Can we forge some examples to prototype the interface before starting coding?

Cheers! :rocket:

menoliu commented 1 year ago

Oh I meant for the tutorial page, users can use the existing cli_search to find which PDB IDs they need in the database already, and during the build step, they can specify which PDB IDs to ignore from the database like: -ign 2MX4 to ignore all chains/segements of 2MX4

joaomcteixeira commented 1 year ago

Hi Nemo, -ign sounds good. We could also allow -ign to accept PDBIDs not in the database. In this way, we (or the user) could have provide (have) "the" database and ignore at will.