medema-group / BiG-SCAPE

Similarity networks of biosynthetic gene clusters
GNU Affero General Public License v3.0
61 stars 26 forks source link

Affinity propagation did not converge #47

Closed yuezhiTang closed 1 year ago

yuezhiTang commented 1 year ago

When I used Bigscape to process tasks, my data was approximately 50000 BGCs,the command line I am using is (nohup python3 /path/BiG-SCAPE-1.1.5/bigscape.py -i /path/output -o /path/res --mode auto --mibig &), and the following error occurred, which I cannot understand and resolve:

Working for each BGC class Sorting the input BGCs

PKSI (14928 BGCs) Writing annotation files Calculating all pairwise distances Ignored unknown character X (seen 2 times) Ignored unknown character X (seen 1 times) Ignored unknown character X (seen 3 times) Ignored unknown character X (seen 1 times) Ignored unknown character X (seen 1 times) Ignored unknown character X (seen 1 times) Ignored unknown character X (seen 3 times) Ignored unknown character X (seen 1 times) Ignored unknown character X (seen 4 times) Ignored unknown character X (seen 4 times) Ignored unknown character X (seen 1 times) Ignored unknown character X (seen 1 times) Ignored unknown character X (seen 4 times) Ignored unknown character X (seen 1 times) Ignored unknown character X (seen 1 times) Ignored unknown character X (seen 1 times) /home/XXX/miniconda3/envs/bigscape/lib/python3.11/site-packages/sklearn/cluster/_affinity_propagation.py:142: ConvergenceWarning: Affinity propagation did not converge, this model may return degenerate cluster centers and labels. warnings.warn(

adraismawur commented 1 year ago

Hi!

This is a warning from scikit-learn telling you that the affinity propagation could not settle on clusters and labels (it never converges to consistent clusters). This should just remain a warning, but some versions of scikit-learn end up returning invalid cluster centers and labels.

Does the rest of the process still complete? And which version of Scikit-learn are you using?

yuezhiTang commented 1 year ago

Hi!

This is a warning from scikit-learn telling you that the affinity propagation could not settle on clusters and labels (it never converges to consistent clusters). This should just remain a warning, but some versions of scikit-learn end up returning invalid cluster centers and labels.

Does the rest of the process still complete? And which version of Scikit-learn are you using?

Thank you for your reply. When my nohup.out file reports this information, bigscape has already stopped running and will not continue. I have tried deleting the file or giving a new output directory to restart bigscape, but it always stops at this step. The version of scikit-learn I am using is 1.3.0. Additionally, I would like to ask: how is the visualization done for the results obtained from other smaller datasets that I have successfully completed with bigscape? When I open the xx.html webpage file generated by the bigscape task, it seems that the images mentioned in the literature are not present. The results processed with other visualization software do not meet my expectations.

adraismawur commented 1 year ago

Hello,

Could you try using scikit-learn version 0.19.2?

Regarding the visualizations, what images are you referring to exactly? BiG-SCAPE generates image files under [output_folder]/SVG for each BGC.

yuezhiTang commented 1 year ago

Hello,

Could you try using scikit-learn version 0.19.2?

Regarding the visualizations, what images are you referring to exactly? BiG-SCAPE generates image files under [output_folder]/SVG for each BGC.

Thank you for your suggestion. I'll go back and try it out. The visualization I am referring to refers to the results of the gene cluster family obtained after the bigscape operation, which allows for a clearer view of which BGCs are concentrated in a cluster due to their similarity. From the literature I have seen related to bigscape, it seems that the visualization of gene cluster clustering can be directly obtained from bigscape?

CatarinaCarolina commented 1 year ago

Hi!

So an svg from each GCF tree can be downloaded from its GCF page on the html. it is generated 'there on the fly'. (svgs from each BGC can be found in the [output_folder]/SVG for each BGC). The BiG-SCAPE CORASON publication also features BGC trees generated by CORASON, perhaps that is also what you are looking for?

Alternatively if you are referring to the network clustering, BiG-SCAPE doesnt produce svgs for this but you will want to load the network file ([output_folder]/network_files/run_id/class_id/file.network) into a tool such as cytoscape to visualise, edit and export the network in a image file format of your choice.

yuezhiTang commented 1 year ago

Hi!

So an svg from each GCF tree can be downloaded from its GCF page on the html. it is generated 'there on the fly'. (svgs from each BGC can be found in the [output_folder]/SVG for each BGC). The BiG-SCAPE CORASON publication also features BGC trees generated by CORASON, perhaps that is also what you are looking for?

Alternatively if you are referring to the network clustering, BiG-SCAPE doesnt produce svgs for this but you will want to load the network file ([output_folder]/network_files/run_id/class_id/file.network) into a tool such as cytoscape to visualise, edit and export the network in a image file format of your choice.

Thank you for your help. I should be able to solve the problem now.

adraismawur commented 1 year ago

Closing this issue, but don't hesitate to re-open if you continue to have issues with the affinity propagation!