szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
106 stars 33 forks source link

Recomendations about using Physical map to compute iHS values. #96

Open m-huertasp opened 1 year ago

m-huertasp commented 1 year ago

Hi @szpiech!

I have a question regarding the use of physical map instead of genetic map when computing iHS. I am using GTEx data to analyze some regions that could be subject to positive selection in different tissues. GTEx data is build based in GRCh38 assembly. I have not found information about genetic maps based in this assembly. The only information that I could find was about the 1000 Genomes Project, which is based in GRCh37 (or at least the information about genetic map is based in that assembly). I was wondering which is the best option:

  1. Use recombination rates from 1000G ( based in GRCh37) although my variants are based in GRCh38.
  2. Use physical map from the variants that I am analyzing.

Thank you very much for this great tool. Looking forward to hearing from you!

Marta.

szpiech commented 1 year ago

Hello,

Well, probably the simplest thing to do would be to use the hg38 map here: https://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/. You may need to slightly reformat it for use with selscan. I believe this map is a liftOver of a map inferred on an earlier build.

-Zachary

m-huertasp commented 1 year ago

Hello @szpiech!

Thank you very much for your suggestion!

I used the hg38 map as you suggested, but we encountered an irregularity when analysing our iHS values.

We are using data from the Genome-Tissue Expression (GTEx) project but 1000 Genomes Project recombination map. When comparing our iHS values with the ones published in Pybus M et. al. Nucleic Acids Res. 2014, we observed no correlation at all (although analysing same populations).

We are not sure if this lack of correlation is due to using the recombination map from the 1000 Genomes and not one from GTEx, as some positions from GTEx are not covered in the recombination map and the other way around. Another possibility is that values do not correlate because of the change in the way iHS is computed (iHH1 and iHH0 swapped in selscan but original formula used in Pybus et. al.), but I found the differences in iHS too huge to be due to this change.

I would be extremely grateful for any assistance you could provide.

Sincerely, Marta.

szpiech commented 1 year ago

Well, I'm not sure precisely why this happened. I'm assuming you normalized the scores in frequency bins, as described in Voight et al 2006 and as is implemented in the norm program. If you haven't, this would almost surely be the problem.

I doubt the difference in genetic maps is the full cause, although I suppose it would contribute to it. You might multiply your scores by -1 just to check, but you would expect to see a strong negative correlation if this was the only issue.

-Zachary

On Wed, May 24, 2023 at 7:05 AM m-huertasp @.***> wrote:

Hello @szpiech https://github.com/szpiech!

Thank you very much for your suggestion!

I used the hg38 map as you suggested, but we encountered an irregularity when analysing our iHS values.

We are using data from the Genome-Tissue Expression (GTEx) project but 1000 Genomes Project recombination map. When comparing our iHS values with the ones published in Pybus M et. al. Nucleic Acids Res. 2014, we observed no correlation at all (although analysing same populations).

We are not sure if this lack of correlation is due to using the recombination map from the 1000 Genomes and not one from GTEx, as some positions from GTEx are not covered in the recombination map and the other way around. Another possibility is that values do not correlate because of the change in the way iHS is computed (iHH1 and iHH0 swapped in selscan but original formula used in Pybus et. al.), but I found the differences in iHS too huge to be due to this change.

I would be extremely grateful for any assistance you could provide.

Sincerely, Marta.

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/96#issuecomment-1560915892, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQQLSEE4RWO5M65BGV3XHXTQLANCNFSM6AAAAAAWRENRME . You are receiving this because you were mentioned.Message ID: @.***>