rhysnewell / ChIP-R

Assessing the reproducibility of proccessed ChIP-seq peaks
GNU General Public License v3.0
9 stars 3 forks source link

Output files & filtering #1

Open ATpoint opened 3 years ago

ATpoint commented 3 years ago

Hello,

thanks for putting this tool together.

Could you please elaborate on what exactly the output files are and how one would need to filter them? In contrast to the README I get two files, *_all.bed and *_optimal.bed (but not T1/T2). The command was simply chipr -i rep1.narrowPeak rep2.narrowPeak -o out for a normal transcription factor ChIP-seq. => What is the "optimal" file here compared to "all"?

=> I am unsure how to obtain the final list of reproducible peaks (and from which file). Do I filter any of these files based on FDR ($9)? Since both files contain entries with FDR > 0.05 (with --alpha left at default 0.05), what is the relationship between FDR and alpha (if there is any), and when would it make sense to change alpha? Edit: After playing with it, it appears that alpha has no effect on the output, can you clarify?

=> Also, the --fragment option, based on the preprint I guess it is recommended for TF ChIP-seq, is that correct?

=> WHat is the difference between "primary" and "secondary" peaks in the output?

Hope you can clarify, thank you for your time, and sorry for the wall of text.

j-andrews7 commented 3 years ago

I would also like to know the answer to this.

nchambwe commented 3 years ago

Yes - me too! Had a conference with @j-andrews7 about this earlier today.

millerh1 commented 2 years ago

I also had this question. The end of rankprod.py here seems to indicate that primary is t1 and secondary is anything which didn't meet the threshold for t1. I think T1 just means the rank product bound pval is less than whatever the nbinom alpha was determined to be.

ZunpengLiu commented 1 year ago

Got the same question. Looking forward to the answer here.

chiefcat commented 1 year ago

From the published paper supplementary figure 1 legend:

After all test fragments have been filtered and/or collapsed into output peaks, two output files are produced: “optimal” for peaks p ≤ θ (where θ is the threshold suggested by the binomial test) and “all” containing all peaks regardless of p.

https://ars.els-cdn.com/content/image/1-s2.0-S0888754321001531-mmc1.pdf

yeroslaviz commented 7 months ago

is there any progress here with an answer?

thx