sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
2.01k stars 504 forks source link

AlphaFold2_mmseqs2 crashed upon processing protein sequence with repeat #77

Open gundalav opened 3 years ago

gundalav commented 3 years ago

Hi,

I tried running AlphaFold2_mmseqs2 on these two files separately:

>HHAWx4
HHAWHHAWHHAWHHAW

And

>KKAWx4
KKAWKKAWKKAWKKAW

Although the sequences are valid amino acids, it gave me the following error message:

running mmseqs2
  0%|          | 0/150 [elapsed: 00:00 remaining: ?]
Traceback (most recent call last):
  File "/home/ubuntu/storage1/colabfold/runner_af2advanced.py", line 163, in <module>
    hhfilter_loc="colabfold-conda/bin/hhfilter", precomputed=precomputed, TMP_DIR=output_dir)
  File "/home/ubuntu/storage1/colabfold/colabfold_alphafold.py", line 292, in prep_msa
    A3M_LINES = cf.run_mmseqs2(I["seqs"], prefix, use_filter=True, host_url=mmseqs_host_url)
  File "/home/ubuntu/storage1/colabfold/colabfold.py", line 150, in run_mmseqs2
    raise Exception(f'MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.')
Exception: MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.

How can I resolve the issue?

G.V.

martin-steinegger commented 3 years ago

I assume that no k-mer can be extracted and this causes issues. MMseqs2 has troubles to search short sequences <20 residues.