gtonkinhill / panaroo

An updated pipeline for pangenome investigation
MIT License
269 stars 34 forks source link

SNPs annotation from Panaroo core genes alignment #171

Closed nicola-palmieri closed 2 years ago

nicola-palmieri commented 2 years ago

Dear author

I have produced a vcf file from the Panaroo output core_genes_alignment.aln and selected a few SNPs of interest using a GWAS approach. My question is: how to find in which genes are those SNPs located? And possibly visualize them.

Thank you! Nicola

nzmacalasdair commented 2 years ago

HI Nicola,

The co-ordinates of all the genes in the core genome alignment (core_genome_alignment.aln) are contained in the header file (core_alignment_header.embl). This file gives the panaroo gene name for all the genes in the core genome alignment in order based on their position in the concatenated core genome. You should be able to find out where the snps are based on their relationship to the start and stop positions for each gene.

The panaroo names for genes can sometimes not be very descriptive, but you can find all the original annotations in the final graph file (final_graph.gml).

I'm not sure what kind of visualisations you might have in mind, but you should be able to make a Manhattan plot of the core genome with most standard GWAS visualisation tools.

Hope this helps, let us know if you run into any further issues!

nicola-palmieri commented 2 years ago

Perfect! That's exactly what I needed.

Thank you! Nicola