medema-group / BiG-SCAPE

Similarity networks of biosynthetic gene clusters
GNU Affero General Public License v3.0
69 stars 26 forks source link

fix counterintuitive bigscape family number assignments? #114

Closed alpole23 closed 10 months ago

alpole23 commented 10 months ago

I am running the latest bigscape conda package (v1.1.6)

Here is my command line: python /home/a-m/alexp2/.conda/envs/bigscape_update/lib/python3.7/site-packages/bigscape/bigscape.py --mix --no_classify --include_singletons --clans-off --cutoffs 0.5 --inputdir /home/a-m/alexp2/antismash_results/antismash7/test_directory/ --outputdir /home/a-m/alexp2/bigscape_results/test_directory/ --pfam_dir /home/a-m/alexp2/multismash/pfam

My question is with how bigscape defines the family numbers. From the 'mix_clustering_c0.50.tsv' files in the network_files folder, one of the family numbers is zero. The family numbers also jump from 0 -> 1 -> 7 where I would expect it to be consecutive number increments like 1 -> 2 -> 3. The current family numbering scheme does not seem intuitive.

<> Is there a way to have bigscape start the numbering from one? <> Is there a particular reason for the large family number jumps? Or is it possible to get bigscape to assign them consecutively?

mix_clustering_c0.50.tsv example file contents:

BGC Name Family Number

CAHS01000016.1.region001 0 CP022725.1.region001 1 JANFMX010000007.1.region001 2 JANFMY010000003.1.region001 2 RHHM01000002.1.region001 7 RQRZ01000003.1.region001 7 RQSA01000004.1.region001 7 RQSB01000002.1.region001 7

adraismawur commented 10 months ago

Hello!

BiG-SCAPE 1.x assigns family numbers by using the index of the cluster which is chosen as the centroid by affinity propagation. This is why both the family may start at 0, and why there are large jumps in number (a family of 7 clusters will cause a jump of 1 through 7).

We are working on version 2.0 which assigns family numbers starting at 1, and assigns them consecutively. Unfortunately it is difficult to adapt version 1.x to do the same.

alpole23 commented 10 months ago

Hi adraismawur,

I understand. Thanks for the explanation!