bloated phi_pool output

jaresoles commented 2 years ago

Hi,

I am really new to this and not really familiar with whats possible in bacterial genomes. Phi pool describes the recombinational divergence of pools of related sequences. Confusingly, I get this bloated phi p of 3075362166333290000 (3.08E18). Is this even possible? These are closely related bacteria collected from the same animal host and organ. Here is the fit that I got. I am assuming there are sequences that I need to get rid of first before running with mcorr?

apsteinberg commented 2 years ago

Hi there,

Thanks again for your interest in using mcorr! To answer your question, a phi_pool value of this magnitude is not physically realistic. Similarly to the other issue #16 where we've been corresponding, these data are also not well fit by the model with recombination. It looks like the correlation profile is too noisy to infer recombination parameters with confidence from this dataset.

Yes, I think trying to remove sequences could help. If you're working with XMFA files, you could also consider a few of other options:

1) If you used reference-based alignment to generate the XMFA file, it's possible that some of your sequences are poorly aligned, and you could remove these sequences using our tool "FilterGaps" to remove them (link here).

2) If you have genome assemblies you could try making a reference-free alignment with our tool AssemblyAlignmentGenerator (link here) which might improve the quality of the alignments and include more genes which may improve the correlation profiles.

3) If you think particular sequence pairs are problematic, you could use our tool "mcorr-pair" (link here) to measure correlation profiles across individual sequence pairs, then fit these correlation profiles using "mcorr-fit" from the main mcorr repository.

Thanks and please let me know if this helps.

Best, Asher

jaresoles commented 2 years ago

Hi, Asher.

I am writing to let you know that your eLife paper has greatly helped me navigate the tool better as well as in gaining deep insights about the genome/HGT dynamics of my samples. I got so much better fits and model parameter values by analyzing the core and accessory genomes separately. I havent figured out why WG Alignments somehow resulted to messed up fits (even after doing sample filtering with FilterGaps) but the analysis done so far actually tells a much better story. Will now close this. Best of luck in all your endeavors!

Many thanks, Aidan

apsteinberg commented 2 years ago

Hi Aidan,

This is very kind of you, and I'm so glad to hear it. Thanks, and best of luck to you as well!

Best, Asher

kussell-lab / mcorr

bloated phi_pool output #17