Open aheritas opened 1 year ago
Hey,
Apologies I never saw this originally.
I think that data above is OK, especially for a genome-wide test, but I would consider newer data, like the 1000 Genomes Project NYGC re-sequencing effort https://www.internationalgenome.org/data-portal/data-collection/30x-grch38
I actually for a colleague here in Oxford imputed some of their samples using that resource, and put some scripts here https://github.com/rwdavies/QUILT-wrap It's written using snakemake. Not sure if it's easily generalizable, but hopefully you can read enough of the "main.smk" file to get a sense of what it's doing.
For regionStart, regionEnd and buffer, I would recommend imputing in regions of ~5Mbp size, for a panel of this size for humans, with a buffer of maybe 500000bp. So e.g. you might do --chr=chr1 --regionStart=1 --regionEnd=5000000 --buffer=500000 --chr=chr1 --regionStart=5000001 --regionEnd=10000000 --buffer=500000 etc
Again sorry for the slow reply, every once in a while I miss these, especially during term
Best wishes Robbie
Hi Robbie, This is my first time using QUILT, so I apologise if these are naive questions. I am in the first step, preparing and reformatting the reference panel. I would like to do that for each chromosome. I have downloaded the reference haplotype, legend and genetic maps from: https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html (I would also appreciate your views if you think these are ok for an initial genome-wide test)
However I am unsure about what I should be inputing in these parameters: --regionStart= --regionEnd= . I would think that that would correspond to the start (1) to end (length) of each chromosome, but I am not sure. I am also not sure if leaving the default behaviour of leaving these parameters blank would allow QUILT to recognize that the entire chromosome should be read.
Thank you in advance for your time and for developing this tool.