mikewolfe / ChIPseq_pipeline

A general ChIP-seq pipeline to reproducibly process many samples at once.
0 stars 3 forks source link

Allow for a "masked" region of the genome when performing alignments/coverage calculations #11

Closed mikewolfe closed 3 years ago

mikewolfe commented 3 years ago

Motivation

Want to be able to mask repetitive regions of genome from analysis by preventing reads from being able to map there without disrupting coordinate locations. Additionally, masked regions should not be included in coverage and normalization calculations. An motivating example in bacterial data would be to exclude ribosomal RNA operons from analysis due to their high similarity.

Roadmap

Input will be a .bed file of locations to exclude.

Final output reference fasta will replace locations with Ns at the given locations.

All deeptools calls can exclude locations using the --blackListFileName option.