rwdavies / STITCH

STITCH - Sequencing To Imputation Through Constructing Haplotypes
http://www.nature.com/ng/journal/v48/n8/abs/ng.3594.html
GNU General Public License v3.0
76 stars 17 forks source link

Some questions about input file making #73

Open RADIOMUMM opened 1 year ago

RADIOMUMM commented 1 year ago

Hi Robbie, I'm a little confused about pos files. What did this file generate from those files? Is it a reference panel or a VCF file after variant calling? But if it is generated from a VCF file after variant calling, is the site "3 42331 A G,T" removed?

Best jennis

rwdavies commented 1 year ago

Hi,

pos contains the list of sites you want to impute. One potential source of that is after initial variant calling and filtering. If you have a VCF list of sites you can make the pos file using code something like the following, possibly changing the header to match what's asked for

gunzip -c sites.vcf.gz | cut -f1,2,4,5 > pos.txt

The site "3 42331 A G,T" will not be accepted by STITCH as STITCH can only impute bi-allelic variants for now. So you could do "3 42331 A G" or "3 42331 A T" but not both G and T

Best, Robbie

RADIOMUMM commented 1 year ago

Hi,

Thanks for your answer, but I have another question: If I do not do variant calling by chromosomes, then the processed pos file will be a file containing all bi-allelic SNP of 1-12 chromosomes, but STITCH is indeed imputed by chromosome, is there any good suggestion?

Best, jennis

suhuan0327 commented 1 year ago

Hi,

Thanks for your answer, but I have another question: If I do not do variant calling by chromosomes, then the processed pos file will be a file containing all bi-allelic SNP of 1-12 chromosomes, but STITCH is indeed imputed by chromosome, is there any good suggestion?

Best, jennis

HI,

I have the same issue, how did you solve it.

Thanks, Su

rwdavies commented 11 months ago

I think in this instance, I would just split the file into one file per chromosome, and impute each chromosome seperately

Something like

for CHR in `echo 1 2 3`
do
  gunzip -c sites.vcf.gz ${CHR} | cut -f1,2,4,5 > pos.${CHR}.txt
  # impute here using pos.${CHR}.txt
done
suhuan0327 commented 11 months ago

I think in this instance, I would just split the file into one file per chromosome, and impute each chromosome seperately

Something like

for CHR in `echo 1 2 3`
do
  gunzip -c sites.vcf.gz ${CHR} | cut -f1,2,4,5 > pos.${CHR}.txt
  # impute here using pos.${CHR}.txt
done

Hi,

Thank you for your reply, and I also used the same way to deal with my files.

Best, Su