salilab / pmi

Python Modeling Interface
https://integrativemodeling.org/nightly/doc/ref/namespaceIMP_1_1pmi.html
11 stars 11 forks source link

ValueError: The system contains at least one chain ID (AA) that is more than 1 character long #259

Closed almara3 closed 2 years ago

almara3 commented 2 years ago

Hi,

I used IMP 2.16 with the attached model below and got following error. The model does not contain a chain ID that is more than 1 character long but only upper- and lowercase single-letter chain IDs. The same setup was working with IMP 2.15.

Thanks

Traceback (most recent call last): File "__/modeling.py", line 264, in <module> mc1.execute_macro() File "/home/__/anaconda3/envs/imp/lib/python3.9/site-packages/IMP/pmi/macros.py", line 434, in execute_macro output.init_pdb_best_scoring( File "/home/__/anaconda3/envs/imp/lib/python3.9/site-packages/IMP/pmi/output.py", line 511, in init_pdb_best_scoring self._init_dictchain(name, prot, mmcif=mmcif) File "/home/__/anaconda3/envs/imp/lib/python3.9/site-packages/IMP/pmi/output.py", line 252, in _init_dictchain raise ValueError( ValueError: The system contains at least one chain ID (AA) that is more than 1 character long; this cannot be represented in PDB. Either write mmCIF files instead, or assign 1-character IDs to all chains (this can be done with thechain_idsargument to BuildSystem.add_state()).

atpmono.pdb.zip

benmwebb commented 2 years ago

The model does not contain a chain ID that is more than 1 character long but only upper- and lowercase single-letter chain IDs

You may be thinking of your input structures here. The model chains are assigned sequentially starting from A, regardless of what your input structure chains are called. The issue is what happens when you have more than 26 chains. IMP 2.16 follows what PDB does these days, and at that point switches to using multiple characters (e.g. AA) for the chain ID. Multi-character chains are fine for RMF or mmCIF output, but cannot be represented in PDB output (used for PDBs of best-scoring models).

The same setup was working with IMP 2.15

Well, sort of. IMP 2.15 had a hard limit of 62 chains since it always used a single character chain ID (upper case letters + lower case letters + digits). IMP 2.16 has no limit.

You have a number of solutions here:

  1. The cleanest solution would be to write the best-scoring models in mmCIF rather than legacy PDB output. Unfortunately IMP.pmi.macros.ReplicaExchange0 doesn't currently provide any way to do that (I'll fix this shortly).
  2. If you don't need best-scoring PDBs, just pass number_of_best_scoring_models=0 to the ReplicaExchange0 constructor (this may also speed up your sampling as PDB output takes a non-zero amount of time).
  3. If you have 62 chains or less, you can get the same behavior as in IMP 2.15 by doing what the error message suggests: passing an explicit list of single-character chain IDs to add_state().
almara3 commented 2 years ago

Thanks, for you quick response and your commit! Adding the list to add_state() worked perfectly fine, other than that I'm looking forward to your next version!