szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
111 stars 33 forks source link

Physical vs genetic map #10

Closed ondrej77 closed 9 years ago

ondrej77 commented 9 years ago

This may be a trivial problem for seasoned veterans, but I keep having issues with getting a genetic map for my VCF file. In your manuscript, you hinted that you ran selscan with only a physical map on the CEU22 data? From the command line parameters, it doesn't look like you can do that? Is it possible? Also, where/how can I get a genetic map for my VCF files (from 1000 genome project)? The 1000 genome project has omni-recombination rate file, but it doesn't contain rate or genetic distance for every SNP. How can I interpolate that for each SNP in my VCF file? Any advice would be much appreciated. Surprisingly, there is little forum conversation on this.

szpiech commented 9 years ago

At the moment, when using VCF files you'll have to provide a PLINK formatted map file as well (there is an example t https://github.com/szpiech/selscan/blob/master/example/example.map). If you would like to use a physical map in place of a genetic map, you'll have to duplicate the physical positions column so that your file is formatted:

<chr#> <snpID> <phys pos> <phys pos>

instead of

<chr#> <snpID> <genetic pos> <phys pos>

If you are on a Linux or OSX machine you can do this with the example file with the following bash command

cat example.map | awk '{print $1, $2, $4, $4}' > new.map

In the near future I will be adding a command line option to directly use the physical positions so that one won't have to jump through those hoops. I also plan to add a map interpolation option directly into selscan, but for the time being you can consider using the predictGMAP program I've written (https://github.com/szpiech/predictGMAP). If you are unable to compile the program, please open a ticket over there with your target OS, and I'll see what I can do to get a binary working for you.

ondrej77 commented 9 years ago

Thanks for your prompt reply. Did not expect to just supply physical distance for both columns. That's simply great.