medema-group / BiG-SCAPE

Similarity networks of biosynthetic gene clusters
GNU Affero General Public License v3.0
61 stars 26 forks source link

MIBiG t2pks BGCs not grouping into families #144

Closed jfoldi81 closed 3 months ago

jfoldi81 commented 5 months ago

I ran BiG-SCAPE on the following MiBIG clusters: BGC0000187, BGC0000194 through BGC0000198, BGC0000213, BGC0000220, BGC0000247, BGC0000269, BGC0000275, BGC0001851, BGC0002045 which all appear as type II polyketide synthases when searching on MIBiG. I downloaded the cluster gbk files for each of these and then ran them on BiG-SCAPE. Despite the fact that these are all type II PKS BGCs, the output did not group any of them into shared families.

I wanted to know if there was a different input command that I could have used or if there was a different reason that these clusters would not be grouping together. I'm attaching the full run log in case that is helpful.

Thanks!

python3 bigscape.py -i mibig_gbks -o mibig_output

   - - Processing input files - -
 Output folder already exists
 Logs folder already exists
 Cache folder already exists
 BGC fastas folder already exists
 Domtable folder already exists
 Domains folder already exists
 pfs folder already exists
 pfd folder already exists
 Including files with one or more of the following strings in their filename: 'cluster', 'region'
 Skipping files with one or more of the following strings in their filename: 'final'

Importing GenBank files
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: Input set has files with no Biosynthetic Genes (affects alignment mode)
   See no_biosynthetic_genes_list.txt

 Starting with 13 files
 Files that had its sequence extracted: 13

Creating output directories
 SVG folder already exists
 Networks folder already exists

Trying threading on 16 cores

Predicting domains using hmmscan
 Predicting domains for 13 fasta files
 Finished generating domtable files.

Parsing hmmscan domtable files
 Processing 13 domtable files
 New domain sequences to be added; cleaning domains folder
 Finished generating pfs and pfd files.

Processing domains sequence files
 Adding sequences to corresponding domains file
 Reading the ordered list of domains from the pfs files
 Creating arrower-like figures for each BGC
  Parsing hmm file for domain information
    Done
  Found file with domains colors
  Reading BGC information and writing SVG
/home/jonF/miniconda3/envs/prokka/lib/python3.12/site-packages/Bio/SeqFeature.py:230: BiopythonDeprecationWarning: Please use .location.strand rather than .strand
  warnings.warn(
 Finished creating figures

   - - Calculating distance matrix - -
Performing multiple alignment of domain sequences

 Using hmmalign
launch_hmmalign took 3.448 seconds
 Trying to read domain alignments (*.algn files)

Generating distance network files with ALL available input files
   Writing the complete Annotations file for the complete set
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'

 Working for each BGC class
  Sorting the input BGCs

  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'

  Others (13 BGCs)
   Writing annotation files
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
   Calculating all pairwise distances
generate_network took 0.064 seconds
   Writing output files
  Calling Gene Cluster Families
  Cutoff: 0.3
/home/jonF/miniconda3/envs/prokka/lib/python3.12/site-packages/sklearn/cluster/_affinity_propagation.py:52: UserWarning: All samples have mutually equal similarities. Returning arbitrary cluster center(s).
  warnings.warn(
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
  Warning: unknown product 'unknown'
CatarinaCarolina commented 5 months ago

Hi!

It appears you are giving BiG-SCAPE MIBiG gbks that are not an output of antiSMASH -> BiG-SCAPE requires antiSMASH processed gbks as input.

To get these you can open the relevant MIBiG page -> 'View antiSMASH-generated output' -> 'Download region GenBank file'.

jorgecnavarrom commented 5 months ago

Hi Just to add to Catarina's comment: BiG-SCAPE will take any gbk, but as you see in your example, it uses the antiSMASH annotations to classify the input into biosynthetic classes, otherwise it will simply put them in the "Other" class. Also, as there is a lot of diversity within t2PKSs (or any other class), they won't necessarily cluster together

github-actions[bot] commented 4 months ago

This issue has not seen activity for 14 days and has been marked as stale. Please comment with additional information if this issue is still relevant.

github-actions[bot] commented 3 months ago

This issue has been stale for 14 days and has been closed. Please feel free to re-open this issue if necessary.