Closed chunxuan-hs closed 10 months ago
Hi,
The map files are not meant to be concatenated or unzipped. Just pass the correct map file to the software according to the chromosome you are phasing.
Simone
@srubinacci Thanks for the info!
Another related question, in my original VCFs there are many other data, like QUAL, DP, INFO, etc. However, in the phased VCF all those info were removed. Did I make a mistake with shapeit5? Is there a way to update my orginal VCF with phased haplotype info?
Thanks!
The commands:
for CHR in {1..22}; do
MAP=chr${CHR}.b38.gmap.gz
OUT=Element_DBS_chr${CHR}.bcf
LOG=Element_DBS_chr${CHR}.log
phase_common --input $path_vcf \
--map $MAP \
--region $CHR \
--pedigree $path_fam \
--reference $path_ref \
--thread 10 \
--log $LOG \
--output $OUT
done
An example. The input record,
1 61987 chr1_61987_A_G;rs76735897 A G 33.0 . AF=0.5;AQ=33;AN=6;AC=3;ANN=G|upstream_gene_variant|MODIFIER|OR4G11P|ENSG00000240361|transcript|ENST00000492842.2|transcribed_unprocessed_pseudogene||n.-962A>G|||||962|,G|upstream_gene_variant|MODIFIER|OR4F5|ENSG00000186092|transcript|ENST00000641515.2|protein_coding||c.-3492A>G|||||3432|,G|intron_variant|MODIFIER|OR4G11P|ENSG00000240361|transcript|ENST00000642116.1|processed_transcript|2/2|n.214-929A>G||||||;COMMON;FREQ=1000Genomes:0.6042,0.3958|GnomAD:0.5273,0.4727|KOREAN:0.8738,0.1262|Korea1K:0.8633,0.1367|NorthernSweden:0.4826,0.5174|Qatari:0.5146,0.4854|SGDP_PRJ:0.4007,0.5993|Siberian:0.35,0.65|TOMMO:0.8642,0.1358|dbGaP_PopFreq:0.4882,0.5118;GNO;RS=76735897;SSR=0;VC=SNV;dbSNPBuildID=131;SNP;VARTYPE=SNP;Cases=0,1,1;Controls=1,0,2;CC_TREND=1.000e+00;CC_GENO=2.231e-01;CC_ALL=8.000e-01;CC_DOM=6.667e-01;CC_REC=6.667e-01 GT:DP:AD:GQ:PL:RNC 0/1:23:22,1:7:6,0,24:.. 1/1:26:0,26:27:33,28,0:.. 0/0:36:35,1:30:0,30,40:..
And after phasing, the record:
1 61987 chr1_61987_A_G;rs76735897 A G . . AC=3;AN=5102 GT 1|0 1|1 0|0
You did not do anything wrong. We remove these tags because we would need to store them in memory the whole time otherwise. If you need them back, you can use bcftools annotate to reannotate your file.
Simone
Thanks for the quick reply, bcftools worked well!
I am trying to phase a trios dataset (only three samples, offspring, father, mather). I dowloaded the tar.gz file of hg38 genetic maps in the resource folder, unzipped it and concatenated all files (execpt X) to a single file.
However, I glot the following error,
ERROR: Wrong order in your genetic map file 248924793bp / 286.27923cM > 12994bp / 0.00000cM
It turns out the position locates at the end of chr1.
How to specify the genetic files correctly, and is a concatenated file including all chr allowed?
My codes:
Many thanks!