rwdavies / QUILT

GNU General Public License v3.0
45 stars 10 forks source link

question about regionStart - regionEnd #20

Open aheritas opened 1 year ago

aheritas commented 1 year ago

Hi Robbie, This is my first time using QUILT, so I apologise if these are naive questions. I am in the first step, preparing and reformatting the reference panel. I would like to do that for each chromosome. I have downloaded the reference haplotype, legend and genetic maps from: https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html (I would also appreciate your views if you think these are ok for an initial genome-wide test)

However I am unsure about what I should be inputing in these parameters: --regionStart= --regionEnd= . I would think that that would correspond to the start (1) to end (length) of each chromosome, but I am not sure. I am also not sure if leaving the default behaviour of leaving these parameters blank would allow QUILT to recognize that the entire chromosome should be read.

Thank you in advance for your time and for developing this tool.

rwdavies commented 1 year ago

Hey,

Apologies I never saw this originally.

I think that data above is OK, especially for a genome-wide test, but I would consider newer data, like the 1000 Genomes Project NYGC re-sequencing effort https://www.internationalgenome.org/data-portal/data-collection/30x-grch38

I actually for a colleague here in Oxford imputed some of their samples using that resource, and put some scripts here https://github.com/rwdavies/QUILT-wrap It's written using snakemake. Not sure if it's easily generalizable, but hopefully you can read enough of the "main.smk" file to get a sense of what it's doing.

For regionStart, regionEnd and buffer, I would recommend imputing in regions of ~5Mbp size, for a panel of this size for humans, with a buffer of maybe 500000bp. So e.g. you might do --chr=chr1 --regionStart=1 --regionEnd=5000000 --buffer=500000 --chr=chr1 --regionStart=5000001 --regionEnd=10000000 --buffer=500000 etc

Again sorry for the slow reply, every once in a while I miss these, especially during term

Best wishes Robbie