Hi,
I'm doing an allele-specific analysis, the pipeline went well until the final ice-normalization step. Tracing backwards I found the raw matrics were empty, and markAllelicStatus aberrantly assigned the reads. See .allelstat file as follows:
## /usr/local/bin//HiC-Pro_3.1.0/scripts/markAllelicStatus.py
## ibam=bowtie_results/bwt2/E9-5-PLA-rep1/E9-5-PLA-rep1-H3K27ac_S15_L002_cut_mm10.bwt2pairs.bam
## snpFile=/home3/gyang/2.ProjectET_HiCUT_Preprocessing/build/snps_C57b6_PWK_PhJ.vcf
## tag=XA
## output=bowtie_results/bwt2/E9-5-PLA-rep1/E9-5-PLA-rep1-H3K27ac_S15_L002_cut_mm10.bwt2pairs_allspe.bam
## verbose=True
## =========================
Total number of snps loaded 17046154.0
## =========================
Total number of reads 91295716 100
Number of reads with at least one 'N' 85300346 93.433
Number of reads assigned to ref genome 24111100 26.41
Number of reads assigned to alt genome 0 0.0
Number of conflicting reads 0 0.0
Number of unassigned reads 67184616 73.59
The weird thing is that when I fed the bam (E9-5-PLA-rep1-H3K27ac_S15_L002_cut_mm10.bwt2pairs.bam) to SNPsplit to verify the data, it went pretty well:
Allele-tagging report
=====================
Processed 91295716 read alignments in total
Reads were unaligned and hence skipped: 0 (0.00%)
46949809 reads were unassignable (51.43%)
26001038 reads were specific for genome 1 (28.48%)
18286300 reads were specific for genome 2 (20.03%)
46452 reads did not contain one of the expected bases at known SNP positions (0.05%)
58569 contained conflicting allele-specific SNPs (0.06%)
Thus this problem was not caused by the data. For VCF file generation, I stringently followed the utils introduction:
$HICPRO_PATH/bin/utils/extract_snps.py -i mgp.v5.merged.snps_all.dbSNP142.vcf -a PWK_PhJ > snps_C57b6_PWK_PhJ.vcf
and I've checked the vcf file by hand, everything seems normal:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT REF-PWK_PhJ-F1
chr1 3000185 rs585444580 G T 999 PASS DP=930;DP4=283,108,361,178;CSQ=T||||intergenic_variant|||||||| GT
0|1
One clue is that I got a HUGE markAllelicStatus.log file (>2G), full of warnings (more lines omitted below):
/home/gyang/anaconda3/bin/python /usr/local/bin//HiC-Pro_3.1.0/scripts/markAllelicStatus.py -s /home3/gyang/2.ProjectET_HiCUT_Preprocessing/build/snps_C57b6_PW
K_PhJ.vcf -v -r -i bowtie_results/bwt2/E9-5-PLA-rep1/E9-5-PLA-rep1-H3K27ac_S15_L002_cut_mm10.bwt2pairs.bam -o bowtie_results/bwt2/E9-5-PLA-rep1/E9-5-PLA-rep1-H
3K27ac_S15_L002_cut_mm10.bwt2pairs_allspe.bam
[E::idx_find_and_load] Could not retrieve index file for 'bowtie_results/bwt2/E9-5-PLA-rep1/E9-5-PLA-rep1-H3K27ac_S15_L002_cut_mm10.bwt2pairs.bam'
Warning : no SNPs found at position chr8 : 11389358. N ignored
Warning : no SNPs found at position chr8 : 11389358. N ignored
Warning : no SNPs found at position chr14 : 49493363. N ignored
Warning : no SNPs found at position chr14 : 49493363. N ignored
Warning : no SNPs found at position chr5 : 73611472. N ignored
Warning : no SNPs found at position chr5 : 73611511. N ignored
Your assistance is highly appreciated! And currently, I have to use SNPsplit to tag the reads and sed command for tag substitutions (XX --> XA). Hope that works. LOL
Hi, I'm doing an allele-specific analysis, the pipeline went well until the final ice-normalization step. Tracing backwards I found the raw matrics were empty, and markAllelicStatus aberrantly assigned the reads. See .allelstat file as follows:
The weird thing is that when I fed the bam (E9-5-PLA-rep1-H3K27ac_S15_L002_cut_mm10.bwt2pairs.bam) to SNPsplit to verify the data, it went pretty well:
Thus this problem was not caused by the data. For VCF file generation, I stringently followed the utils introduction:
$HICPRO_PATH/bin/utils/extract_snps.py -i mgp.v5.merged.snps_all.dbSNP142.vcf -a PWK_PhJ > snps_C57b6_PWK_PhJ.vcf
and I've checked the vcf file by hand, everything seems normal:
One clue is that I got a HUGE markAllelicStatus.log file (>2G), full of warnings (more lines omitted below):
Your assistance is highly appreciated! And currently, I have to use SNPsplit to tag the reads and sed command for tag substitutions (XX --> XA). Hope that works. LOL