gtonkinhill / panaroo

An updated pipeline for pangenome investigation
MIT License
269 stars 34 forks source link

GWAS recombinogenic organism #133

Closed smb20200615 closed 3 years ago

smb20200615 commented 3 years ago

Hello,

I am interested in running a GWAS for a recombinogenic species. Does it make sense to remove recombination from the core tree before running downstream steps? Also is running panaroo with default parameters, as described in the manual, sufficient for any GWAS or does it need to be tweaked depending on the exact problem? If so, what parameters should I pay most of my attention to

panaroo -i *gff -o output -t 10 --verbose -a core

Thank you!

gtonkinhill commented 3 years ago

Hi,

In general you should not run recombination detection algorithms on the core alignments generated by Panaroo, Roary etc. This is because it breaks some of the assumptions of algorithms like Gubbins and ClonalFrame.

If you just need the phylogeny to control for population structure in a GWAS is should be sufficient to generate it straight from the core alignment. Most of the parameters in Panaroo focus on how it handles the accessory genome so you should be okay running it in the default mode.

In case it helps I would normally run a GWAS on both the gene presence/absence matrix from Panaroo and a unitig presence/absence matrix generated using Pyseer.

smb20200615 commented 3 years ago

Thank you so much for your thorough explanation. Does this also apply to building phylogenetic trees (not for use with GWAS)? I was following this pipeline https://github.com/bactopia/bactopia/blob/master/tools/roary/main.nf where recombination is removed from the core alignment and used to generate a tree. Many thanks in advance.

gtonkinhill commented 3 years ago

For building phylogenies I would normally recommend aligning reads directly to a suitable reference genome. I usually use snippy for this as it can generate the required multiple sequence alignment file for running Gubbins/ClonalFrame and Iqtree/RAxML.