ku-cbd / PhageBoost

Rapid discovery of novel prophages using biological feature engineering and machine learning
GNU General Public License v3.0
35 stars 6 forks source link

KeyError: 'X' #20

Open linda5mith opened 2 years ago

linda5mith commented 2 years ago

Full traceback:

PhageBoost -f /data/san/data0/users/linda/databases/release202/fastani/database/GCA/000/007/325/GCA_000007325.1_genomic.fna.gz -o results

processing: GCA_000007325
time after genecalls: 1.3131694793701172
Traceback (most recent call last):
  File "/home/linda/programs/anaconda3/envs/PhageBoost-env/bin/PhageBoost", line 8, in <module>
    sys.exit(main())
  File "/home/linda/programs/anaconda3/envs/PhageBoost-env/lib/python3.7/site-packages/PhageBoost/main.py", line 219, in main
    df = calculate_features(genecalls)
  File "/home/linda/programs/anaconda3/envs/PhageBoost-env/lib/python3.7/site-packages/PhageBoost/main.py", line 59, in calculate_features
    df, _, _ = calc_features.df2AAandDNAfeatures(genecalls, name='header')
  File "/home/linda/programs/anaconda3/envs/PhageBoost-env/lib/python3.7/site-packages/PhageBoost/calc_features.py", line 215, in df2AAandDNAfeatures
    DF, DF_AA, DF_DNA = RunAAandDNA(dna_entries, aa_entries, locations)
  File "/home/linda/programs/anaconda3/envs/PhageBoost-env/lib/python3.7/site-packages/PhageBoost/calc_features.py", line 202, in RunAAandDNA
    DF_AA = RunAA(AA_entries, verbose = verbose)
  File "/home/linda/programs/anaconda3/envs/PhageBoost-env/lib/python3.7/site-packages/PhageBoost/calc_features.py", line 178, in RunAA
    df_biopython = biopython_proteinanalysis(entries, scaling=scaling)
  File "/home/linda/programs/anaconda3/envs/PhageBoost-env/lib/python3.7/site-packages/PhageBoost/calc_features.py", line 168, in biopython_proteinanalysis
    d = biopython_proteinanalysis_seq(seq, scaling=scaling)
  File "/home/linda/programs/anaconda3/envs/PhageBoost-env/lib/python3.7/site-packages/PhageBoost/calc_features.py", line 33, in biopython_proteinanalysis_seq
    flex = np.array(res.flexibility())
  File "/home/linda/programs/anaconda3/envs/PhageBoost-env/lib/python3.7/site-packages/Bio/SeqUtils/ProtParam.py", line 183, in flexibility
    score += (flexibilities[front] + flexibilities[back]) * weights[j]
KeyError: 'X'
linda5mith commented 2 years ago

Seems the code as is cannot handle pyrodigal unknown AA prediction denoted as 'X'. Will try look into this!

tsp-kucbd commented 2 years ago

yes, this is fixed in the GitHub version. We will later push into pip too. For now you can install it with pip install git+https://github.com/ku-cbd/PhageBoost