Closed kalavattam closed 4 months ago
Hi Kris,
--ng_from_fcs
, --ng_to_fcs
, and --ng_step_fcs
should be decreased for the same reason as the Jaccard score.--ng_from_fcs
chooses what length to use for the background. If the result is not changed between --ng_from_fcs=2000
and --ng_from_fcs=100k
(default), the cPNF value at 2k and 100k distances would have little difference. But in your figure, the highest value of FCS (about 1e+02) looks slightly different.s
in Figure 1(5) in the original manuscript) becomes small. Therefore, such rapid saturation is common in yeast.Since I have not extensively tested SSP on yeast samples, the results may differ slightly from those described in the paper.
@rnakato: Thank you for your response and guidance.
I was wondering if you could provide any specific advice on appropriate parameter values for --ng_from_fcs
, --ng_to_fcs
, and --ng_step_fcs
. Would you change the values I initially tested?
FCS_ng_from=2000 # S. cerevisiae 2 kb (a 50× reduction from H. sapiens 100000)
FCS_ng_to=50000 # S. cerevisiae 50 kb (a 20× reduction from H. sapiens 1000000)
FCS_ng_step=1000 # S. cerevisiae 1 kb (a 10× reduction from H. sapiens 100000)
I am keen to use your tools effectively in my work, and any information you could share to help avoid a potentially time-consuming parameter scan would be greatly appreciated.
Thank you again for your support.
Best regards,
Kris
Dear Kris,
Thank you for your extensive use of this tool. In the FCS, there are no strict criteria for choosing the background distance. As shown in your figure, the background level is quite similar at distances > 1k, so you can choose any distance. However, in cases where the samples contain broad mode peaks, a distance > 10kbp would be better, as used in the README. The important thing is to keep a consistent distance for all samples to be compared. Under the condition, SSP returns comparable values. For simplicity, it would be good to use the same distance with --ng_from, --ng_to
, and --ng_step
.
Thank you so much, @rnakato. This makes sense. I am going to close the issue; I may have another question or two related to this, but I will post them to a new GitHub Issue later. Thank you again.
Dear @rnakato,
Thank you again for developing and maintaining SSP. I have a question regarding the parameterization of SSP for FCS calculations with S. cerevisiae ChIP-seq data. In the SSP tutorial described in the README.md, it is advised to adjust the parameters for calculating the background Jaccard score $J(d_{bg})$ to use a range of 10 kb to 50 kb, averaging scores at steps of 500 bp, instead of the default 500 kb to 1 Mb range with 5 kb steps:
Should the start (
--ng_from_fcs
), stop (--ng_to_fcs
), and step (--ng_step_fcs
) values for FCS background calculations also be reduced for S. cerevisiae ChIP-seq data? (Is $\text{cPNF}(d{bg}, s) = \frac{f(d{bg}, \space s)}{N{d{bg}}}$, where $N{d{bg}} = |v_c^{fwd} \cap vc^{rev} (d{bg})|$?)I tried running FCS calculations with default and reduced
--ng_from_fcs
,--ng_to_fcs
, and--ng_step_fcs
values:Despite using different
--ng_from_fcs
,--ng_to_fcs
, and--ng_step_fcs
values, the related plots look largely the same (seeimage_FCS-values_default-vs-reduced.png
). The top plots use default FCS background calculation values, and the bottom plots use reduced values. Could you provide any advice on why this might be happening and how to proceed? Also, the plots for "Proportion of nearest neighbor fragments" and "Cumulative proportion" reach respective minimal and maximal plateaus quickly, and the curves do not change withlen
values. Is this expected? This behavior is different than what is shown in the supplementary material for Nakato, Shirahige, Sensitive and robust assessment of ChIP-seq read distribution using a strand-shift profile, Bioinformatics 2018-0307.Thank you again for your help and guidance.
Best regards,
Kris