jsh58 / Genrich

Detecting sites of genomic enrichment
MIT License
182 stars 27 forks source link

Calibration of the 'Minimum AUC for a peak' for Atac-seq #73

Closed lilian18100 closed 3 years ago

lilian18100 commented 3 years ago

Hello,

I chose to analyse my Atac-seq Data with genrich, I use as input bam file generated by STAR reporting uniquely mapped read only.

I have 3 treatments and 3 replicates per treatment at 2 time points.

I am currently peak calling them separately to run the differential analysis with Diffbind, with this command

./Genrich -t treatmt1_replict1_timept1.bam -o treatmt1_replict1_timept1.narrowPeak -k treatmt1_replict1_timept1.bedGraph -a 50 -r -j

I was wondering how could I set the '-a' parameter accordingly to the mapping coverage of each one of my bam files (knowing that they have different coverage depth). In order to use Diffbind, I think it would be better to peak call them separately.

What information from my bam file could I use to choose an appropriate value for the Minimum AUC for a peak ?

Thank you very much for your help,

Best Regards

malcook commented 3 years ago

I will be interested in @jsh58 's response to this, and propose that it might be most appropriate to down-sample the bam files to their greatest lowest coverage value, and then use common parameters for both downstream Genrich the Diffbind processing.

jsh58 commented 3 years ago

Thanks for the question. I do not know of any information in the bam file that would indicate an appropriate value for minimum AUC.

It is true that, in general, a higher sequencing depth will lead to lower p-values and thus more peaks to some extent (as described in #11). You would have to explore how much of this effect there is with your dataset, and then judge whether you should alter the -a threshold for each replicate.

Note that you can easily explore the effects of altering peak-calling parameters (including -a) via the -X and -P arguments.

lilian18100 commented 3 years ago

Thank you for your answer.

I will definitely play around with -X and -a.

@malcook the problem with downsampling your bam file is that it might cause alteration of coverage that you don't really control, isn't it?

malcook commented 2 years ago

FWIW - I have found Genrich's performance on ATAC data with three or more replicates to be quite satisfactory using AUC threshold of zero and q=.05.

Do you find that surprising @jsh58 or @lilian18100 ?