Open x811zou opened 4 years ago
Hi Scarlett, I answered you in the other thread, but just to clarify for other people who may see this. It looks like you are using VCFs and BAMs with discordant chromosome naming. Please make sure that both the VCF and BAM have the same naming (e.g. "1" vs "chr1"). Once you have consistent naming between the two, make sure to use the appropriate annotation files with phASER (with or without "chr"), which can be downloaded from here: https://stephanecastel.wordpress.com/2017/02/15/how-to-generate-ase-data-with-phaser/
@secastel Thank you! I just figured it out! But my result gene_ae.txt columns (totalCount log2_aFC n_variants variants gw_phased) are all zeros. I am wondering whether it is because my input data is unphased, which only have 0/0,1/1,0/1,1/0 in VCF. Would it explain the "0" variants below? Does phASER work with unphased data? Thanks!
#7. Outputting phased VCF...
GT field is not being updated with phASER genome wide phase. This can be changed using the --gw_phase_vcf argument.
Compressing and tabix indexing output VCF...
COMPLETED using 1264249 reads in 492 seconds using 4 threads
PHASED 10111 of 30870 all variants (= 0.327535) with at least one other variant
GENOME WIDE PHASED 0 of 34022 unphased variants (= 0.000000)
GENOME WIDE PHASE CORRECTED 0 of 30870 variants (= 0.000000)
Global maximum memory usage: 1262.21 (mb)
phASER should work with an unphased VCF, although it will work much, much better if you start with a phased VCF. If you are working with human samples, I would strongly recommend phasing your VCF using a tool like the Sanger Imputation Server (https://imputation.sanger.ac.uk).
As for why you are seeing all zeros, I expect it may have something to do with a contig naming problem. When running phaser_gene_ae you need to make sure to use the right features file to match your chromosome naming. For example (assuming hg19), if your chromosomes are named “chr1” then you need to use this file: https://www.dropbox.com/s/am09zwpjhs01k8u/gencode.v19.GRCh37.genes.chr.bed.gz?dl=0 otherwise if they are named “1” you need to use this file: https://www.dropbox.com/s/1u9zo1kx61zx6ca/gencode.v19.GRCh37.genes.bed.gz?dl=0
Please let me know if this helps.
@secastel Thank you! I just have another question regarding to the "unphased" genotype from the output phased VCF file from phASER, I do not see the phased genotype for each variant. I am attaching the first line from the input VCF and output VCF below:
##### Input VCF
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 20
chr1 14542 . A G 22.44 PASS AC=2;AF=1.00;AN=2;DP=1;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=22.44;SOR=1.609 GT:AD:DP:GQ:PL 1/1:0,1:1:3:32,3,0
###### Output VCF from phASER
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 20
chr1 14542 . A G 22.44 PASS AC=2;AF=1.00;AN=2;DP=1;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=22.44;SOR=1.609 GT:AD:DP:GQ:PL:PG:PB:PI:PW:PC:PM 1/1:0,1:1:3:32,3,0:1/1:.:.:1/1:.:.
I just checked haplotypic_counts.txt, and I found that instead of outputting the haplotype for each variant, the output files have listed the alleles for each (A/B) haplotype at each chromosome-position range. Please let me know if I miss anything! Thank you!
The phase will be updated in the "GT" field of the outputted VCF if you have included the argument "--gw_phase_vcf 1". In the case of the first line that you've shown, first, that variant is not covered by any reads, thus phaser will not phase it, and second, it is a homozygous variant, so there is no phase to speak of. phASER runs best when your input VCF has been phased using population phasing. If you are working with human data, I would suggest using a tool like the Sanger Imputation Server.
Hi! I aligned HG00096 from 1000GP with STAR and Tophat separately. And then I ran phASER using the same parameter you gave in the tutorial.
The error message for running phASER with the BAM aligned with STAR is:
The problem for using phASER with the BAM aligned with Tophat is that it stuck at first step forever ...
Does phASER work with Tophat aligned bam file? And what parameters do you specify for STAR alignment? I wonder whether that is the point where causing these errors. Could you help me with this? I'd appreciate your help!
Thanks, Scarlett