nboley / idr

IDR
GNU General Public License v2.0
168 stars 46 forks source link

Cannot run IDR with more than 2 replicates #35

Open thek71 opened 7 years ago

thek71 commented 7 years ago

Hi,

I am trying to run IDR on a set of 3 replicates and I get the following error: idr: error: unrecognized arguments: K9AcY4_2/MACS2_K9acY4_2/K9acY4_2_macs2_peaks.broadPeak

This is the third replicate. If I run it for 2 replicates things are working, but not for more than 2. Any advice on the issue?

NemanjaV commented 7 years ago

Here you can find a very detailed explanation how to analyze three chip-seq replicates using SPP or MACS2 in combination with IDR.

yichangyu commented 5 years ago

Hi NemanjaV,

I looked at the link you gave, but couldn't find how to deal with 3 replicates using IDR, can you give some suggesstions or give a new link? Thanks!

melnuesch commented 5 years ago

Same problem here. I have multiple ChIP-seq replicates.

mevers commented 4 years ago

Agree with the previous two posts. The link provided by @NemanjaV does not provide any insight how to handle >2 replicates. The original BigDataScript-based pipeline from the Kundaje lab (now deprecated in favour of the WDS-based ENCODE pipeline) can only handle 2 replicates. Ditto for the ENCODE pipeline, at least based on their IDR Python module which considers only two peak files peak1 and peak2

def idr(basename_prefix, peak1, peak2, peak_pooled, peak_type, thresh, rank, out_dir): ...

I possible option (at the risk of being overly conservative) is to perform pairwise IDR analyses followed by a final IDR analysis on the IDRed peaks. So in other words, for n=3

  1. IDR(peaks(rep1) & peaks(rep2)) giving peaks(rep12)
  2. IDR(peaks(rep2) & peaks(rep3)) or IDR(peaks(rep1) & peaks(rep3)) giving peaks(rep23) or peaks(rep13), respectively
  3. IDR(peaks(rep12) & peaks(rep23)) or IDR(peaks(rep12) & peaks(rep13)), respectively.

I would be very interested in hearing about other approaches.

olechnwin commented 4 years ago

Hoping for insights for this issue too...

yuewangpanda commented 3 years ago

Have the same issue. Agreed with @mevers. Just did a simple test. So I have 4 replicates.

You can only do idr(1, 2), idr(3,4) and then idc(12, 34). If you do idr(1, 2) and then idc(12, 3) it will returns error of "Peak files must contain at least 20 peaks post-merge".

Looking for a smarter way to do it.

Maarten-vd-Sande commented 3 years ago

I haven't used it, but you could always try chip-r: https://github.com/rhysnewell/ChIP-R . It seems like an IDR-related approach, but does accept more than 2 replicates

jaspitzer commented 1 year ago

So for my >2 replicate approaches, I usually just do the combinations like @mevers proposes. For three replicates, this means:

idr(1,2), followed by idr(2,3) and finally idr((1,2) & (2,3))

It does work fine, the issue @yuewangpanda has seems to be that there is very little overlap between replicates, leading to that error

pliu55 commented 8 months ago

According to Section '4d. Select final peak calls - conservative set' in ENCODE3 ChIP-seq pipeline specifications:

If you have more than 2 true replicates select the longest peak list from all pairs that passes the 5% IDR threshold. This is the conservative peak set.