Removing genes without CLIP coverage - minimum peak height

philippdre / omniCLIP

omniCLIP is a CLIP-Seq peak caller

GNU General Public License v3.0

15 stars 9 forks source link

Removing genes without CLIP coverage - minimum peak height #16

Open simojoe opened 4 years ago

simojoe commented 4 years ago

In the Removing genes without CLIP coverage step, genes are filtered according to the sum of coverage along their entire genomic coordinates. This filtering is therefore dependant from gene (and intron) lengths and accepts genes that have peaks consisting of single reads, without any overlap.

Should the metric be changed to add the importance of overlapping CLIP reads? If so, what is the minimum number of reads that would be required to overlap.

philippdre commented 4 years ago

The purpose of the filtering is to reduce computation of genes where little data is available. I think it wouldn't make a difference if we would include overlapping reads in the threshold (or at least I haven't seen examples where this would have been the case).

simojoe commented 4 years ago

Using the data given in the example folder, I get the following results :

For the 5363 genes in the annotation file :

1658 are filtered out by the current filter (total coverage >= 100)
2412 are filtered out by a minimum max peak filter (max peak >= 5)

It is to be noted that there is a full overlap between the two filters, meaning that all genes in the coverage filter are present in the peak filter. By reducing the max peak filter to 1, we still have 1937 genes to filter out, meaning that we currently allow peaks made of a single read.

philippdre commented 4 years ago

For cases like helicases that move along a gene we would expect to see some non-overlapping reads that should still be included. Therefore, I would be hesitant to completely rely on the overlapping filtering. We could however include an additional argument that could overwrite the default parameter if needed. Would that be sufficient for you application?