The dataset I am working with consists of 100 bp reads, and presumably the callability mask will be different than for 35 bp reads. I am attempting to generate my own mask, and just want to verify that I am doing this correctly. Below is the workflow I am using, is this OK? I am using BWA 0.7.10, and SAMTOOLS 1.1, the 1000 Genomes Phase 2 reference.
The dataset I am working with consists of 100 bp reads, and presumably the callability mask will be different than for 35 bp reads. I am attempting to generate my own mask, and just want to verify that I am doing this correctly. Below is the workflow I am using, is this OK? I am using BWA 0.7.10, and SAMTOOLS 1.1, the 1000 Genomes Phase 2 reference.
./splitfa hs37d5.fa 100 > hs37d5.fa.split bwa aln -R 1000000 -O 3 -E 3 -t 50 hs37d5.fa hs37d5.fa.split > hs37d5.fa.split.sai bwa samse hs37d5.fa hs37d5.fa.split.sai hs37d5.fa.split | samtools view -b - > hs37d5.fa.split.bam samtools view hs37d5.fa.split.bam | ./gen_raw_mask.pl > rawMask.hs37d5.fa ./gen_mask -l 100 -r 0.5 rawMask.hs37d5.fa > mask.hs37d5.fa
Once I have the mask, do I need to split it up by chromosome, or can I run on the whole genome?