Want to be able to mask repetitive regions of genome from analysis by preventing reads from being able to map there without disrupting coordinate locations. Additionally, masked regions should not be included in coverage and normalization calculations. An motivating example in bacterial data would be to exclude ribosomal RNA operons from analysis due to their high similarity.
Roadmap
Input will be a .bed file of locations to exclude.
[x] Change config file to accept a masking file
Final output reference fasta will replace locations with Ns at the given locations.
[x] change combine_fasta.py script to allow for editing of final fasta
[x] make sure effective genome size is calculated after edits
All deeptools calls can exclude locations using the --blackListFileName option.
[x] update all deeptools calls to take --blackListFileName argument
Motivation
Want to be able to mask repetitive regions of genome from analysis by preventing reads from being able to map there without disrupting coordinate locations. Additionally, masked regions should not be included in coverage and normalization calculations. An motivating example in bacterial data would be to exclude ribosomal RNA operons from analysis due to their high similarity.
Roadmap
Input will be a
.bed
file of locations to exclude.Final output reference fasta will replace locations with Ns at the given locations.
combine_fasta.py
script to allow for editing of final fastaAll deeptools calls can exclude locations using the
--blackListFileName
option.--blackListFileName
argument