Generate mask for 100 bp reads

jlrf98 commented 9 years ago

The dataset I am working with consists of 100 bp reads, and presumably the callability mask will be different than for 35 bp reads. I am attempting to generate my own mask, and just want to verify that I am doing this correctly. Below is the workflow I am using, is this OK? I am using BWA 0.7.10, and SAMTOOLS 1.1, the 1000 Genomes Phase 2 reference.

./splitfa hs37d5.fa 100 > hs37d5.fa.split bwa aln -R 1000000 -O 3 -E 3 -t 50 hs37d5.fa hs37d5.fa.split > hs37d5.fa.split.sai bwa samse hs37d5.fa hs37d5.fa.split.sai hs37d5.fa.split | samtools view -b - > hs37d5.fa.split.bam samtools view hs37d5.fa.split.bam | ./gen_raw_mask.pl > rawMask.hs37d5.fa ./gen_mask -l 100 -r 0.5 rawMask.hs37d5.fa > mask.hs37d5.fa

Once I have the mask, do I need to split it up by chromosome, or can I run on the whole genome?

houzhe1991 commented 7 years ago

could you tell me what the 100bp mean? Thanks a lot!

stschiff commented 5 years ago

Hi, sorry, can't support this step, it's been ages since I've built a mappability mask.

stschiff / msmc-tools

Generate mask for 100 bp reads #5