pangenome / pggb

the pangenome graph builder
https://doi.org/10.1038/s41592-024-02430-3
MIT License
369 stars 40 forks source link

PGGB use case with hexaploidy genomes #238

Open brettChapman opened 2 years ago

brettChapman commented 2 years ago

Hi

I'll be starting a new pangenome project shortly and was wondering if anyone had any advice on how to go about generating a pangenome graph from a hexaploidy genome.

Previously with a diploid genome I ran PGGB with each chromosome separately, so there would be no interchromosomal translocations identified. With the hexaploidy genome I'll ideally be working with 3 genomes per chromosome. I'm concerned if I include say chr1A, chr1B, chr1C for each sample (there are 27 hexaploidy genomes), I'll be pushing the resources even harder. Unless I only work say with all samples from chr1A....chr14A per genome graph, and miss out on identifying translocations between each of the genomes (A, B and C).

I'm aware vg deconstruct you can specify the ploidy, but I assume this is if each of the genomes are included in the single genome graph. If I simply generated a graph for each of the A, B and C genomes, then would the ploidy level for vg deconstruct be 2 instead of 6?

I'll also be looking at using cactus-minigraph and have similar concerns about how to approach graph construction without exhausting my resources. I have my own cluster with 128GB RAM per node, and I have a shared server elsewhere which has 1.4TB RAM.

Thanks.

AndreaGuarracino commented 1 year ago

I don't have very clever things to say. For each chromosome, I would put all haplotypes for all samples together in the same graph. How divergent are your genomes? If the sequence divergence is not so high and the number of contigs is not very high, PGGB should go smoothly without crazy resource requirements. And therefore, working with the expected ploidy (6).

In the meantime, have you already tried this route?