popgenmethods / pyrho

Fast inference of fine-scale recombination rates based on fused-LASSO
MIT License
42 stars 4 forks source link

Low recombination rates #31

Open JosephLalli opened 11 months ago

JosephLalli commented 11 months ago

Hi there,

I've gotten markedly different results when phasing using pyrho maps compared to hapmap's lifted over maps. While looking for an explanation, I've noticed that the GRCh38 maps you've released have lower overall rates of recombination than what has been reported by other methods. For example, the length of chr1 in cM averaged across all reported populations:

pyrho: 180 HapMap: 286 Fledel-Alon 2011: ~280 Coop 2012: 206

True, these are based on different samples, and recombination can vary a great deal between individuals. But HapMap's map, like yours, was based on 1KGP genomes. Why does pyrho predict fewer recombination events than LDhat (hapmap)?

jeffspence commented 11 months ago

Hi @JosephLalli ,

Thanks for this. pyrho infers a population scaled recombination rate and then scales it to a real recombination rate by using the mutation rate to determine the effective population size. This seems to generally get roughly the correct scaling but is not perfect. Some other methods scale the total map length they infer to match the expected number of crossovers from pedigree studies. That approach also has some issues (e.g., total map length can be dominated by how hot hotspots are inferred to be, and it's generally difficult to precisely estimate very high recombination rates). Given all of this, I would take the overall scaling with a grain of salt, and for your particular application, it might make sense to do some scaling (e.g., multiplying all the rates by 1.5 or so).

Hope this helps, Jeff

JosephLalli commented 11 months ago

We are working on producing a phased reference panel of 1KGP haplotypes for the new T2T reference genome. We have been using SHAPEIT5 to phase our T2T aligned callset. Initially, we saw decent performance (as measured by switch error rate) when simply providing SHAPEIT5 with maps generated by lifting over the original hg36 hapmap (HM) maps. Ideally though, we'd want to use genomic maps generated from our 1KGP variant dataset, which would allow for information from the telomeres and centromeres to be incorporated into the map. (In addition, we are obviously inherently interested in patterns of recombination in these regions.)

Our initial effort to phase the 1KGP dataset using maps generated by pyrho has produced panels with higher switch error rates than the lifted over map. I think the issue of scaling might be causing our problem. I will rescale our panels so that total cM per chromosome matches the initial HM map, and see if that improves our observed SER. These phasing tools were tuned around the HM maps, so it makes sense that very different chromosomal sizes (in cM) would affect the performance of these tools.

In the meantime, I would be happy to arrange a video chat with you if you want to discuss the problem further. I am sure my colleague Andrew Bortvin would also be interested in attending. You can reach me at lalli at wisc.edu.

Again, thank you for all your help. You've been incredibly generous with your time. -Joe Lalli