naumanjaved / fingerprint_maps

Create "haplotype maps" of variants in order to use with Picardtools Crosscheck_Files utility, which allows for robust genotyping of functional genomic data.
24 stars 9 forks source link

Any pointers for files to use in GRCh38 context ? #3

Closed af8 closed 6 years ago

af8 commented 6 years ago

Hi, Is it possible to apply your script with hg38 and do you know if required files are available somewhere ?

Thank you, Anthony

naumanjaved commented 6 years ago

Hi Anthony,

The hg38 VCFs are available here.

Unfortunately I do not know genetic recombination maps are available for hg38. You may be able to use the hg19 maps and convert them to hg38 using the UCSC liftover tool with an approach similar to what is described here.

If liftover is unsuccesful, then you can modify the script to use kb coordinates instead of cM as follows:

  1. In create_PLINK_binary() on line 173 of map_building_functions.py, change

plink_command = "plink --vcf " \

plink_command = "plink --vcf " \
                    + int_directory + chrom + ".recode_u.vcf" \
                    + " --make-bed" \
                    + " --out " + int_directory + chrom
  1. In LD_score() on line 180 of map_building_functions.py, change
    LD_command = "python " + LD_script \
                 + " --bfile " + int_directory + chrom \
                 + " --ld-wind-cm " + str(window_cm) \
                 + " --out " + int_directory + "LD-" + chrom \
                 + " --l2 --yes-really"

    to

LD_command = "python " + LD_script \
                 + " --bfile " + int_directory + chrom \
                 + " --ld-wind-kb " + str(window_cm) \
                 + " --out " + int_directory + "LD-" + chrom \
                 + " --l2 --yes-really" 

This modified script will just ignore the hg19 recombination map and calculate LDScore over a window with size specified in kb(just set --window_cm to a kilobase equivalent of whatever your original window size was, i.e. 1.0 cm ~ 1000 kb).

Hope this helps, Nauman