hpcaitech / FastFold

Optimizing AlphaFold Training and Inference on GPU Clusters
Apache License 2.0
556 stars 84 forks source link

KeyError: 'msa_mask' Issue #179

Open YoshitakaMo opened 11 months ago

YoshitakaMo commented 11 months ago

I've tried to use FastFold for a large hetero-multimer protein complex, but I encountered this issue.

2023-07-25 14:53:20,908 INFO worker.py:1518 -- Started a local Ray instance.
2023-07-25 14:53:22,253 INFO workflow_access.py:356 -- Initializing workflow manager...
2023-07-25 14:53:24,072 INFO api.py:203 -- Workflow job created. [id="fastfold_data_workflow Tue Jul 25 14:53:22 2023"].
(WorkflowManagementActor pid=1387) 2023-07-25 14:53:24,128 INFO workflow_executor.py:86 -- Workflow job [id=fastfold_data_workflow Tue Jul 25 14:53:22 2023] started.
...
<MSA generation was successfully finished>
...
...
running in multimer mode...
Traceback (most recent call last):
  File "/foo/bar/FastFold/inference.py", line 556, in <module>
    main(args)
  File "/foo/bar/FastFold/inference.py", line 164, in main
    inference_multimer_model(args)
  File "/foo/bar/FastFold/inference.py", line 285, in inference_multimer_model
    processed_feature_dict = feature_processor.process_features(
  File "/foo/bar/FastFold/fastfold/data/feature_pipeline.py", line 124, in process_features
    return np_example_to_features(
  File "/foo/bar/FastFold/fastfold/data/feature_pipeline.py", line 106, in np_example_to_features
    features = input_pipeline_fn(tensor_dict, cfg.common, cfg[mode])
  File "/foo/bar/FastFold/fastfold/data/input_pipeline_multimer.py", line 107, in process_tensors_from_config
    tensors = compose(nonensembled)(tensors)
  File "/foo/bar/FastFold/fastfold/data/data_transforms.py", line 76, in <lambda>
    return lambda x: f(x, *args, **kwargs)
  File "/foo/bar/FastFold/fastfold/data/input_pipeline_multimer.py", line 124, in compose
    x = f(x)
  File "/foo/bar/FastFold/fastfold/data/data_transforms_multimer.py", line 298, in make_msa_profile
    batch['msa_mask'][..., None],
KeyError: 'msa_mask'

I know this issue is similar to https://github.com/hpcaitech/FastFold/issues/119, but I have no idea for the latest FastFold version. Please let me know the solution.

Computational environment

The input command was:

FASTAFILE="foo2.fasta"
OUTPUTDIR="./foo2"
DATE="2099-07-14"
DATABASEDIR=/foobar/alphafold/db-v2.3.2
python3.9 ${FASTFOLDDIR}/inference.py ${FASTAFILE} ${DATABASEDIR}/pdb_mmcif/mmcif_files/ \
    --output_dir ${OUTPUTDIR} \
    --gpus 4 \
    --model_preset multimer \
    --max_template_date ${DATE} \
    --relaxation \
    --use_precomputed_alignments ${OUTPUTDIR}/alignments \
    --save_prediction_result True \
    --uniref90_database_path=$DATABASEDIR/uniref90/uniref90.fasta \
    --mgnify_database_path=$DATABASEDIR/mgnify/mgy_clusters_2022_05.fa \
    --bfd_database_path=$DATABASEDIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --uniref30_database_path=$DATABASEDIR/uniref30/UniRef30_2021_03 \
    --obsolete_pdbs_path=$DATABASEDIR/pdb_mmcif/obsolete.dat \
    --uniprot_database_path=$DATABASEDIR/uniprot/uniprot.fasta \
    --pdb_seqres_database_path=$DATABASEDIR/pdb_seqres/pdb_seqres.txt \
    --jackhmmer_binary_path=$FASTFOLDDIR/fastfold-conda/bin/jackhmmer \
    --hhblits_binary_path=$FASTFOLDDIR/fastfold-conda/bin/hhblits \
    --hhsearch_binary_path=$FASTFOLDDIR/fastfold-conda/bin/hhsearch \
    --kalign_binary_path=$FASTFOLDDIR/fastfold-conda/bin/kalign