schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
306 stars 36 forks source link

Expand SyRI to phased assembly #135

Closed baozg closed 2 years ago

baozg commented 2 years ago

Hi, @mnshgl0110

Can SyRI expand to phased assembly to generate a phased VCF for variants using haplotype-resloved assemblies?

Thanks Zhigui

mnshgl0110 commented 2 years ago

Hi Zhigui,

My interpretation is that genomic differences between haplotype-resolved assemblies, would be phased by definition as the assemblies are phased.

However, merging that information in a VCF is tricky as it involves linear representation of non-linear rearrangements. We are currently developing methods for that, but it is currently not available.

Best Manish

baozg commented 2 years ago

Hi Manish,

Yes. It's definitely what I am looking for. Hopefully, it can support diploid and autopolyploid. Looking forward to your development.

mnshgl0110 commented 2 years ago

Hi @baozg,

I have a question and would like to know your opinion. Let's say that you are comparing two diploid genomes, one of which you consider as reference and other as query. Then what do you want in the VCF? In this case, four pair-wise genome comparisons would happen. Do you expect all of them to be in one VCF? The VCF is based on the genomic coordinates of the reference genome, but the reference is diploid so either we create two VCFs (one for each haplotype) or we consider one reference haplotype of as the "true" reference genome and then compare all haplotypes to it. It would be great if you could share your opinions on what would make more sense to you and why.

Best Manish

baozg commented 2 years ago

Hi Manish

Actually, in human they already have some pipeline for phased assemblies (Dipcall, SVIM-asm,https://github.com/EichlerLab/pav). Typically, I will set a haploid as ref (double haploid or inbreeding line) in plant, then use diploid1 (haplotype-resloved assembly ) to call vcf (1|0 / 1|0 / 1|1 / 0/0), then use bcftools merge if I have more than 1 individual.

For population-level assemblies,we need to do all-to-all alignements and then calling variant from all alignments. I do prefer to use graph pangenome to call variants (vg deconstruct)

Thanks Zhigui

mnshgl0110 commented 2 years ago

Thanks Zhigui for sharing your ideas. This is very helpful.