Stages to filter alignments

Hi all,

I am in the process of adding two new stages to filter (remove) either:

entire contigs (FilterChr);
specific regions of the genome (FilterContaminants).

Use cases

Sometimes its advisable to map to the full genome (assembled chrs + fragmented contigs) to remove ambiguous alignments (ref), but for the data analysis itself these extra contigs can usually be remove since the information they add not extra information and the processing of all contigs can be slow and bothersome. For the analysis of ATAC-seq data , for which I am creating a pipeline, chrM is a know contaminant and all reads mapping to it should be removed as a matter of course. Or one might want to analyse a single chromosome for testing purposes.
for some experiments some have consistently high coverage and should be removed. These are taken care of in ChiP-seq (removed from the peak list), but sometimes is useful to remove these before BigWig creation (non-structural sRNA read coverage), but there might be other use cases.

I tried to make both stages quite flexible, (1) uses a file with a list of chromosomes, and (2) a bed/gtf/gtff so that they can be included in a pipeline without much hassle.

Question:

Is there enough interest to included the SLURM settings for these stages in the bpipe.config of all the pipelines, or keept it simple and just include it in my small RNA-seq and the new ATAC-seq?

imbforge / NGSpipe2go

Stages to filter alignments #46

Use cases

Question: