EpiCompare::compute_consensus_peaks() has some limitations currently:
[ ] Only uses peak files as inputs. Potentially losing information compared to using raw files (bedgraph, bigwig, bam). We should do some benchmarking to see how accurate consensus peak calling is using these different levels of inputs.
[ ] Does not retain key columns for computing percentiles (e.g.c("total_signal", "qValue", "Peak Score")). Thus the consensus peaks can't be used for precision-recall curves or correlations. Would be good to integrate some sort of aggregation procedure for these metrics.
[ ] Assumes all peak files were called using the same methodology, but in reality can come from SEACR (stringent or relaxed), MACS2 (narrow or broad) using different hyperparameters. Would be useful to have a method to harmonize all different kinds of peaks, though this might be a bit beyond the scope of this particular function.
Note: nf-core/cutandrun seems to use a very simplistic consensus peak calling strategy with bedtools, which I think is just looking for overlap between peak files, perhaps analogous to compute_consensus_peaks(method="granges"). We may want to suggest some more robust alternative consensus peak calling strategies to the cutandrun pipeline authors.
@Al-Murphy mentioned that MACS2 may have a functionality like this. @SarahMarzi @KittyMurphy have used this in the past?
Note: can’t seem to find any reference to consensus peak calling in the SEACR paper or github:
https://github.com/FredHutch/SEACR/search?q=consensus
So unless they’re using different terminology, it doesn't seem this is part of the functionality of SEACR.
EpiCompare::compute_consensus_peaks()
has some limitations currently:c("total_signal", "qValue", "Peak Score")
). Thus the consensus peaks can't be used for precision-recall curves or correlations. Would be good to integrate some sort of aggregation procedure for these metrics.Note: nf-core/cutandrun seems to use a very simplistic consensus peak calling strategy with bedtools, which I think is just looking for overlap between peak files, perhaps analogous to
compute_consensus_peaks(method="granges")
. We may want to suggest some more robust alternative consensus peak calling strategies to the cutandrun pipeline authors. @Al-Murphy mentioned that MACS2 may have a functionality like this. @SarahMarzi @KittyMurphy have used this in the past?Note: can’t seem to find any reference to consensus peak calling in the SEACR paper or github: https://github.com/FredHutch/SEACR/search?q=consensus So unless they’re using different terminology, it doesn't seem this is part of the functionality of SEACR.