vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.07k stars 191 forks source link

Construct a generation-level pan-genome #4295

Open jwli-code opened 1 month ago

jwli-code commented 1 month ago

In the future, I want to map the resequencing data of different species of onto my map genome.Then the group vcf with good classification is used to carry out the process of vgmpmap I wonder if this line of thinking makes sense.

Thanks.

jeizenga commented 1 month ago

If the subgenomes contain homologous chromosomes with each other, it would probably make more sense to build graphs for those chromosomes all together. Otherwise, you can create a lot of mapping ambiguity between sequences that are shared between the homologs.

jwli-code commented 1 month ago

I would like to ask about merging vcf files from the results of minigraph and then filtering for SNP variants (excluding variants larger than 50bp). After that, I plan to process the population data using pangenie and use the population's vcf for the graph transcriptome workflow. I'm not sure if this is the correct approach. How effective is pangenie software for handling small SNP variations? Are there any recommended short genotyper software for SNPs?

jeizenga commented 1 month ago

@glennhickey might know better, but I believe PanGenie does not genotype any small variants. In humans, our collaborators have achieved very good SNP accuracy by projecting graph alignments to a reference with vg surject and then using DeepVariant. I'm unsure if the DeepVariant pipelines can handle non-diploid genomes though. We also have experimental features to call SNPs on non-reference sequences.

glennhickey commented 1 month ago

PanGenie is pretty good for SNPs, but DeepVariant does much better on GIAB benchmarks. I don't think either works on non-diploid genomes, though.

For PanGenie, I think you're better off filtering SVs after genotyping instead of before.

Please avoid using #4113 for the time being.

jwli-code commented 1 month ago

@glennhickey Thanks. If a polyploid plant is an allopolyploid formed by the hybridization of two species, would it also face the same issues? The chromosome numbers of the two species are inconsistent, and only a small portion of their genetic material is homologous.