Open stevehxf opened 6 years ago
Since in the most of recent cases the read length from Illumina sequencer can be fairly big as 150bps, this warning would raise no big issue on your dataset. But if you have many datasets to be processed, to make your analysis more consistent, I recommend you keep using a fixed ‘d' value for all your samples by setting, for example '--nomodel --extsize 300'.
Hi Tao,
Thanks for responding to Steve's question as I was having the same issue. I am wondering how much of an effect the --extsize
argument will have on the results, since my predicted 'd' value is about 240 in my samples (I have 150bp PE reads)? Should I use that value or should I use something higher like 300?
Additionally, for a particular sample, the predicted 'd' from macs2 predictd
is 219 while the predicted 'd' from macs2 callpeak
on the same sample with its input is 234. In both cases I was using the default arguments (e.g. -m 5 50
). Is this because the predictd
function does not consider the control/input sample?
Thanks, Noah
@knowah First, about your additional question, predictd
won't filter out redundant reads as callpeak
. So, to better simulate callpeak
way, you have to do macs2 filterdup --keep-dup 1
on your ChIP sample. Also, the fragment size prediction has nothing to do with control/input sample.
The extsize
is just a matter of 'smoothing' of your ChIP-seq data. Ideally, you won't see any big difference in terms of peaks called between 230 and 240. Nowadays, you can think this feature purely as a 'data quality control' method that can be utilized to evaluate your ChIP sample. If you see a much small number such as 50bp, then you should worry about something wrong with the library preparation. If you are satisfied with the data quality, to ignore the tiny differences from predictd
and to use a fixed --nomodel --extsize N
is always recommended.
Hi Tao,
I used INPUT as my ctrl and used default parameters to call narrow peaks. I encountered a warning: Since the d(171) calculated from paired-peaks are smaller than 2*tag length, it may be influenced by unknown sequencing problem!
Any idea what happened and how to fix it?
Thanks!
Best, Steve