gtonkinhill / fastbaps

A fast approximation to a Dirichlet Process Mixture model (DPM) for clustering genetic data
MIT License
56 stars 8 forks source link

Level 1 and 2 cluster #25

Open tthye1 opened 8 months ago

tthye1 commented 8 months ago

Hi Gerry,

I work on a project with about 200 Campylobacter isolates. I have created SNP based and cgMLST based distance matrices and phylogenies and clustered the isolates with fastbaps and hierCC. My question is how the fastbaps level 1 and 2 compare to the levels of the multi-clustering scheme of hierCC software used by Enterobase. Are they comparable, and which software works best for which data set (SNPs or cgMLST alleles)?

Best,

Thorsten

gtonkinhill commented 8 months ago

Hi Thorsten,

I'm afraid I don't have a lot of experience with hierCC so I'm not sure exactly how it compares. I mostly use SNP data when running fastbaps although it should work with cgMLST as well.

In terms of running fastbaps, it's often worth considering the choice of prior used in the algorithm. This can go from conservative (less clusters) using the 'symmetric' prior up to more sensitive (more clusters) using the 'optimised.baps' prior which is tailored to each dataset. If you have a phylogeny you are confident in you could also consider conditioning on the tree which will ensure that all clusters are consistent with the phylogeny.

tthye1 commented 8 months ago

Hi Gerry,

Thanks for your input. I will run both softwares on my datasets and compare the results.

Best,

Thorsten

Am 08.01.2024 um 04:35 schrieb Gerry Tonkin-Hill @.***>:

Hi Thorsten,

I'm afraid I don't have a lot of experience with hierCC so I'm not sure exactly how it compares. I mostly use SNP data when running fastbaps although it should work with cgMLST as well.

In terms of running fastbaps, it's often worth considering the choice of prior used in the algorithm. This can go from conservative (less clusters) using the 'symmetric' prior up to more sensitive (more clusters) using the 'optimised.baps' prior which is tailored to each dataset. If you have a phylogeny you are confident in you could also consider conditioning on the tree which will ensure that all clusters are consistent with the phylogeny.

— Reply to this email directly, view it on GitHub https://github.com/gtonkinhill/fastbaps/issues/25#issuecomment-1880341594, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWFHMTS2U7IRX5OVMS76TTYNNSQ3AVCNFSM6AAAAABBOIU4BCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBQGM2DCNJZGQ. You are receiving this because you authored the thread.