sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
1.97k stars 495 forks source link

Amber works in AlphaFold2_mmseqs2 but not in AlphaFold2_batch #47

Open konstin opened 3 years ago

konstin commented 3 years ago

When I try to run the predefined example sequence (PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK) with templates, one model and amber it works in AlphaFold2_mmseqs2, but fails in AlphaFold2_batch with the following error:

ValueError                                Traceback (most recent call last)

<ipython-input-3-a76dac23e0b1> in <module>()
    391                            Ls=[len(query_sequence)], crop_len=crop_len,
    392                            model_params=model_params, use_model=use_model,
--> 393                            do_relax=use_amber)
    394 
    395   # gather MSA info

<ipython-input-3-a76dac23e0b1> in predict_structure(prefix, feature_dict, Ls, crop_len, model_params, use_model, do_relax, random_seed)
    276                                               stiffness=10.0,exclude_residues=[],
    277                                               max_outer_iterations=20)      
--> 278         relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein)
    279         relaxed_pdb_lines.append(relaxed_pdb_str)
    280 

/content/alphafold/relax/relax.py in process(self, prot)
     62         tolerance=self._tolerance, stiffness=self._stiffness,
     63         exclude_residues=self._exclude_residues,
---> 64         max_outer_iterations=self._max_outer_iterations)
     65     min_pos = out['pos']
     66     start_pos = out['posinit']

/content/alphafold/relax/amber_minimize.py in run_pipeline(prot, stiffness, max_outer_iterations, place_hydrogens_every_iteration, max_iterations, tolerance, restraint_set, max_attempts, checks, exclude_residues)
    459   # `protein.to_pdb` will strip any poorly-defined residues so we need to
    460   # perform this check before `clean_protein`.
--> 461   _check_residues_are_well_defined(prot)
    462   pdb_string = clean_protein(prot, checks=checks)
    463 

/content/alphafold/relax/amber_minimize.py in _check_residues_are_well_defined(prot)
    139   """Checks that all residues contain non-empty atom sets."""
    140   if (prot.atom_mask.sum(axis=-1) == 0).any():
--> 141     raise ValueError("Amber minimization can only be performed on proteins with"
    142                      " well-defined residues. This protein contains at least"
    143                      " one residue with no atoms.")

ValueError: Amber minimization can only be performed on proteins with well-defined residues. This protein contains at least one residue with no atoms.
matthewnicotra commented 3 years ago

The same thing is happening to me with a custom MSA. Amber works in the af_mmseqs2 notebook, but fails in the af_batch notebook.

pankev-in commented 3 years ago

Error message when running batch inference:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/colabfold_env/bin/colabfold_batch", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/anaconda3/envs/colabfold_env/lib/python3.7/site-packages/colabfold/batch.py", line 856, in main
    recompile_all_models=args.recompile_all_models,
  File "/home/ubuntu/anaconda3/envs/colabfold_env/lib/python3.7/site-packages/colabfold/batch.py", line 703, in run
    stop_at_score=stop_at_score,
  File "/home/ubuntu/anaconda3/envs/colabfold_env/lib/python3.7/site-packages/colabfold/batch.py", line 236, in predict_structure
    relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein)
  File "/home/ubuntu/anaconda3/envs/colabfold_env/lib/python3.7/site-packages/alphafold/relax/relax.py", line 62, in process
    max_outer_iterations=self._max_outer_iterations)
  File "/home/ubuntu/anaconda3/envs/colabfold_env/lib/python3.7/site-packages/alphafold/relax/amber_minimize.py", line 482, in run_pipeline
    ret.update(get_violation_metrics(prot))
  File "/home/ubuntu/anaconda3/envs/colabfold_env/lib/python3.7/site-packages/alphafold/relax/amber_minimize.py", line 356, in get_violation_metrics
    structural_violations, struct_metrics = find_violations(prot)
  File "/home/ubuntu/anaconda3/envs/colabfold_env/lib/python3.7/site-packages/alphafold/relax/amber_minimize.py", line 343, in find_violations
    "clash_overlap_tolerance": 1.5,  # Taken from model config.
  File "/home/ubuntu/anaconda3/envs/colabfold_env/lib/python3.7/site-packages/alphafold/model/folding.py", line 773, in find_structural_violations
    bond_length_tolerance_factor=config.violation_tolerance_factor)
  File "/home/ubuntu/anaconda3/envs/colabfold_env/lib/python3.7/site-packages/alphafold/common/residue_constants.py", line 861, in make_atom14_dists_bounds
    residue_bonds, residue_virtual_bonds, _ = load_stereo_chemical_props()
  File "/home/ubuntu/anaconda3/envs/colabfold_env/lib/python3.7/site-packages/alphafold/common/residue_constants.py", line 409, in load_stereo_chemical_props
    with open(stereo_chemical_props_path, 'rt') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'stereo_chemical_props.txt'

Got it to work by fixing three things:

  1. Install 'pdbfixer' with conda install -c conda-forge pdbfixer
  2. Manually download "stereo_chemical_props.txt" with wget -q https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt --no-check-certificate and link its abs path to https://github.com/sokrypton/ColabFold/blob/main/colabfold/batch.py#L215
  3. Fix bug in https://github.com/sokrypton/ColabFold/blob/main/colabfold/batch.py#L259 -> Should be relaxed_pdb_lines not unrelaxed_pdb_lines