medema-group / BiG-SCAPE

Similarity networks of biosynthetic gene clusters
GNU Affero General Public License v3.0
69 stars 26 forks source link

IndexError when running --mix flag #13

Open Sam-Will opened 2 years ago

Sam-Will commented 2 years ago

Running BiG-SCAPE but getting an IndexError when adding the '--mix' flag.

Below is the submission script used which works fine without the --mix flag but produces the error message below when its added. Any ideas what might be happening?

Thanks, Sam

#!/bin/bash

#SBATCH --job-name=BiG-SCAPE_fulltest_110522
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --time=10:00:00
#SBATCH --mem=10000M

# Change to the directory you submitted the job from
cd "${SLURM_SUBMIT_DIR}"

# What host, time and directory is the jobID running from
echo Running on host "$(hostname)"
echo Time is "$(date)"
echo Directory is "$(pwd)"
echo Slurm job ID is "${SLURM_JOBID}"
echo This jobs runs on the following machines:
echo "${SLURM_JOB_NODELIST}"

# Add miniconda
module add languages/miniconda/3.9.7

# Activate the BiG-SCAPE enviroment
source activate bigscape

# Run BiG-SCAPE
python ./BiG-SCAPE/bigscape.py -i ALL_BGC -o output_BGC --pfam_dir Pfam-A --mibig --mix
Mix (2314 BGCs)
  Calculating all pairwise distances
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 1 times)
Ignored unknown character X (seen 6 times)
Ignored unknown character Z (seen 3 times)
/user/home/sw17073/.conda/envs/bigscape/lib/python3.9/site-packages/sklearn/cluster/_affinity_propagation.py:250: ConvergenceWarning: Affinity propagation did not converge, this model will not have any cluster centers.
  warnings.warn(
generate_network took 534.260 seconds
   Removing 1693 non-relevant MIBiG BGCs
  Writing output files
  Calling Gene Cluster Families
  Cutoff: 0.3
Traceback (most recent call last):
  File "/mnt/storage/scratch/sw17073/bigscape/./BiG-SCAPE/bigscape.py", line 3065, in <module>
    family_data = clusterJsonBatch(mix_set, pathBase, "mix", reduced_network, pos_alignments,
  File "/mnt/storage/scratch/sw17073/bigscape/./BiG-SCAPE/bigscape.py", line 1771, in clusterJsonBatch
    clanLabels = [familyIdx[exemplarsClans[labelsClans[i]]] for i in range(len(familyIdx))]
  File "/mnt/storage/scratch/sw17073/bigscape/./BiG-SCAPE/bigscape.py", line 1771, in <listcomp>
    clanLabels = [familyIdx[exemplarsClans[labelsClans[i]]] for i in range(len(familyIdx))]
IndexError: list index out of range
cmandreani commented 2 years ago

Hi @SamWilliamsUOB , I retrieved the same error with anaconda3, and network files were properly generated but there was no tree constructed.

With docker worked fine, though.

Sam-Will commented 2 years ago

Managed to get this script to work with the --clans-off flag

jorgecnavarrom commented 2 years ago

Seems like scikit-learn's Affinity propagation was the cause.. which version do you have installed?

Sam-Will commented 2 years ago

Hi Jorge,

So looks like 1.0.2 for sklearn

>>> import sklearn
>>> print(sklearn.__version__)
1.0.2

Thanks, Sam

jorgecnavarrom commented 2 years ago

Could you try with v0.19.2?

zreitz commented 1 year ago

FYI, I had a similar error with --mix and --cutoffs above ~0.4.

/lustre/BIF/nobackup/reitz001/mambaforge/envs/bigscape/lib/python3.6/site-packages/sklearn/cluster/_affinity_propagation.py:247: ConvergenceWarning: Affinity propagation did not converge, this model will not have any cluster centers.
  "will not have any cluster centers.", ConvergenceWarning)
Traceback (most recent call last):
  File "/home/reitz001/lustre/software/BiG-SCAPE/bigscape.py", line 3042, in <module>
    clanCutoff=options.clan_cutoff, htmlFolder=network_html_folder)
  File "/home/reitz001/lustre/software/BiG-SCAPE/bigscape.py", line 1459, in clusterJsonBatch
    labels[bgcExt2Int[bgcSub2Ext_[i]]] = bgcExt2Int[bgcSub2Ext_[exemplarsSub[labelsSub[i]]]]
IndexError: list index out of range

Downgrading sklearn from v0.24.2 to v0.19.2 solved it.