Closed khodor14 closed 1 year ago
Hello Khodor,
Yes. I will use Fig 2 of the pandora paper as a reference for this comment:
Firstly, you need to split your genome into several loci (Fig 2A), do a multiple sequence alignment and build the PRG (population reference graph) for each locus (Fig 2B). Once you built the PRG for all loci, you will have a Pangenome Reference graph (PanRG, Fig 2C). Then you can use pandora index
to index this PanRG, and pandora compare
to call variants from sample reads to the indexed PanRG (Fig 2D).
To split your genome in several loci (the first step), I would recommend panaroo to find genes and piggy to find intergenic regions. In a branch of a yet unmerged fork, you can find a containerized snakemake pipeline that will do all this for you: https://github.com/leoisl/make_prg/tree/assemblies_to_PanRG/scripts/assemblies_to_PanRG . You just need to specify a dir with your assemblies, and it will run everything you need and create MSAs split by genes and intergenic regions in the output directory (see config.yaml
and the Snakefile
in that dir). Then you need run make_prg
on these loci and then pandora
. We will soon have this script merged into this repo.
cheers
You also need to provide the --genotype
option if you want a genotyped VCF (see here)
This is when I want to map reads or to compare reads against an already indexed PRG. But I want is different, I want to build a vcf for E coli against a reference genomes. I might also build different VCFs against different references.
I think this is cannot be done through pandora. Anyway, thank you all for the detailed description.
Closing due to inactivity, feel free to reopen.
If I'm building a graph of n genomes of E.coli, is there an option in pandora to get the vcf of these genomes?