kundajelab / atac_dnase_pipelines

ATAC-seq and DNase-seq processing pipeline
BSD 3-Clause "New" or "Revised" License
159 stars 81 forks source link

differences between "new" and "old" pipeline #139

Open computbiolgeek opened 5 years ago

computbiolgeek commented 5 years ago

Hi @akundaje @leepc12 , I noticed that the README was updated like an hour ago, which basically says that the WDL-based pipeline is an exact copy of this pipeline. However, what I found was that using the WDL-based pipeline reduced significantly the number of idr-thresholded peaks. For example, for the ATAC-seq experiment ENCSR668VCT, the WDL-based pipeline produced 103385 peaks, whereas, this pipeline produced 146794 peaks. What should we be concerned about here? Thank you!

akundaje commented 5 years ago

Are you sure you are comparing the same files across the two runs? Can you provide some details/logs etc. so we can look into it.

They should generate identical or near identical results.

-Anshul.

On Thu, Sep 6, 2018 at 1:13 PM Bian Li notifications@github.com wrote:

Hi @akundaje https://github.com/akundaje @leepc12 https://github.com/leepc12 , I noticed that the README was updated like an hour ago, which basically says that the WDL-based pipeline is an exact copy of this pipeline. However, what I found was that using the WDL-based pipeline reduced significantly the number of idr-thresholded peaks. For example, for the ATAC-seq experiment ENCSR668VCT, the WDL-based pipeline produced 103385 peaks, whereas, this pipeline produced 146794 peaks. What should we be concerned about here? Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/139, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7EQ8fNdL1hcGji9Ed_N7CKCkLitGMks5uYYHsgaJpZM4WdooD .

leepc12 commented 5 years ago

@computbiolgeek Please post your input JSON for the new pipeline and command line for the old one.

computbiolgeek commented 5 years ago

Is it likely that it is because the two pipelines are using different default idr thresholds? The WDL-based uses 0.05 whereas this one uses 0.1?

leepc12 commented 5 years ago

No, both use the same default idr threshold (0.1). Please upload your input JSON file. It seems like you have "atac.idr_thresh" : 0.05 in it.

Jin

On Thu, Sep 6, 2018 at 4:41 PM Bian Li notifications@github.com wrote:

Is it likely that it is because the two pipelines are using different default idr thredhold? The WDL-based uses 0.05 whereas this one uses 0.1?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/139#issuecomment-419275612, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_I8swRwS6A9VKiTw9kQPzmo3q6umks5uYbKngaJpZM4WdooD .