Open simojoe opened 4 years ago
The purpose of the filtering is to reduce computation of genes where little data is available. I think it wouldn't make a difference if we would include overlapping reads in the threshold (or at least I haven't seen examples where this would have been the case).
Using the data given in the example folder, I get the following results :
For the 5363 genes in the annotation file :
It is to be noted that there is a full overlap between the two filters, meaning that all genes in the coverage filter are present in the peak filter. By reducing the max peak filter to 1, we still have 1937 genes to filter out, meaning that we currently allow peaks made of a single read.
For cases like helicases that move along a gene we would expect to see some non-overlapping reads that should still be included. Therefore, I would be hesitant to completely rely on the overlapping filtering. We could however include an additional argument that could overwrite the default parameter if needed. Would that be sufficient for you application?
In the
Removing genes without CLIP coverage
step, genes are filtered according to the sum of coverage along their entire genomic coordinates. This filtering is therefore dependant from gene (and intron) lengths and accepts genes that have peaks consisting of single reads, without any overlap.Should the metric be changed to add the importance of overlapping CLIP reads? If so, what is the minimum number of reads that would be required to overlap.