Closed conmeehan closed 1 year ago
Hi Conor,
Could you check the value of nu inferred before and after removing sequences? This can be found in the file with suffix .em.txt
. This value represents the level of polymorphism in recombined regions. It is possible that ClonalFrameML infers a value that is too low after you removed the most divergent sequences. You can stop this from happening by changing the prior on nu. For example to force nu to be around 0.05 you could use -prior_mean "0.1 0.001 0.05 0.0001" -prior_sd "0.1 0.001 0.0001 0.0001"
. Note that ClonalFrameML is mostly designed to analyse sequences that are part of the same species or even the same lineage within a species, so if your dataset contains multiple species then you might need to help ClonalFrameML a bit by specifying a strong prior on nu as described above.
Best wishes, Xavier
Hi Xavier,
Thanks for the quick response. The nu before removal is: Parameter Posterior Mean Posterior Variance a_post b_post nu 0.0324815 1.00773e-09 1.04696e+06 3.22323e+07
After removal it is: Parameter Posterior Mean Posterior Variance a_post b_post nu 0.0452501 2.75491e-09 743243 1.64252e+07
So I dont think it is moving too much but perhaps I should set it as you say and see if that affects the outcome? These are all the same species but indeed may be separate lineages.
Cheers, Conor
Yes these values of nu look fine, both before and after removal, so there is no issue with this and no need to try changing the prior on nu. It's good to know that all genomes are from the same species, I guess having ~60% recombined on some branches is not impossible, or there could be mistakes in the alignment that look like recombination events. When you remove some sequences to try to remove recombination events you would need to make sure that you remove all sequences affected by the events, ie all the sequences that are below the branch on which there is recombination. Don't hesitate to email me if you're still having problems with this as I would need to see what the results look like.
Hi,
I have a dataset of some Mycobacteria and am trying to remove recombination from a core genome alignment. When I run CFML I get some sequences that have recombination signals throughout (>60% of the genome detected as such, showing as dark blue in the PDF from the RScript). If I remove these sequences and run everything again, a different set of sequences then pops up as being high in recombination.
Is there any reason for this and any way to stop it? I dont want to remove sequences if they should be kept, but need to ensure recombination free alignment is created. Any help appreciated!
Cheers, Conor