Closed fatyang799 closed 1 year ago
This is just two set of the NB notation. The intuitive interpretation in sequencing read is not very clear (at least to me).
For your question about the formula: For NB model, the M (mean) and σ2 (variance) is defined as (wiki page of NB model is really good) M = ps / (1-p) σ2 = ps / (1-p)^2 So, you can easily get (1-p) = M/σ2 -> p = (σ2-M)/σ2 s = M * (1-p) / p -> s = M2/(σ2-M)
Because greater s can give you less significant p-value, we use it adjust the -log10pvalue by local input signal: s_local = s * ri_input / mean_r_input
Best wishes. Guanjue
Thanks for the reply!
I have known that the p and s are just parameters for NB model. And these are just mathematic formula to calculate parameters:
M = ps / (1-p) σ2 = ps / (1-p)^2
In addition, I have thought about this problem for 2 days. I'd like to share my thoughts with you, and I hope you can provide some comment about my idea. many thx.
In my opinion, parameters in r_i ∼ NB (s, p)
are:
p
as the probability of a read not falling into a bin. And the p is same for all bins.s
as the number of reads not falling into a bin.But here, I still don't understand why the p0=ps [In your paper, this corresponds to p0=(1-p)s]
Btw, I notice that the definition of p in the paper is different from the p in the script (negative_binomial_neglog10p.R
). And I adopt your idea in the script where the definition of p is the probability of fail.
Best,
Liu
Hi Guanjue,
I found another problem. In the paper, the p-value seems to be calculated from the NB background model. I thought you would calculate like this:
f(X=N)=pnbinom(N-1, sig_0_size, sig_0_prob, lower.tail=FALSE)
But in the script file (negative_binomial_neglog10p.R
), the p-value derives from the ratio of 2 probability:
pval_new = pnbinom(N-1, sig_0_size, sig_0_prob, lower.tail=FALSE) / pnbinom(0, sig_0_size, sig_0_prob, lower.tail=FALSE) * (l-num_0)/l
According to my understanding:
pnbinom(N-1, sig_0_size, sig_0_prob, lower.tail=FALSE) / pnbinom(0, sig_0_size, sig_0_prob, lower.tail=FALSE)
: this represents the ratio of the probability that the count equals N to the probability that the count equals 1.(l-num_0)/l
: It denotes the proportion of non-zero bins.Are you calculating the pval_new
to make all P value comparable? Because the P value in each bin was corrected with various ri_input / mean_r_input
.
All of the above are my guesses. And I really want to know why you choose pval_new
instead of f(X=N)
as output value.
Best,
Liu
For "pval_new = pnbinom(N-1, sig_0_size, sig_0_prob, lower.tail=FALSE) / pnbinom(0, sig_0_size, sig_0_prob, lower.tail=FALSE) * (l-num_0)/l ":
Here, since our NB model parameter is estimated only by using the non0-bins, we adjusted model output pvalue by ( observed_non0_bin_pvalue / expected_non0_bin_pvalue ). (It can be considered as a prior probability for all non-0 bins) The "expected_non0_bin_pvalue" = pnbinom(0, sig_0_size, sig_0_prob, lower.tail=FALSE) The "observed_non0_bin_pvalue" = (l-num_0)/l; l is the number of all bins, num_0 is the number of 0 bins.
Really great talking with you and appreciate all your help! This makes me have a deeper understanding of s3norm. If I need anything else, I'll start a new issue.
Hi Guanjue,
I got confused about the output of s3norm. I would greatly appreciate if you can answer these questions!
Liu
Our goal of transforming normalized_rc to -log10pvalue is to adjust the local background signal along the genome in chip-seq data. However, after this transformation, the signal to noise ratio is not the same across different samples, we decided to treat -log10-p as signals and normalized again.
Best wishes. Guanjue
Excellent idea of S3norm!
But when I read the methods and code of s3norm, I do not understand the biological meaning of the parameters (p and s) of the NB model.
The p denotes the probability of success. In the sequencing data, what is success? Why do you assign (σ2-M)/σ2 to p? How to understand (σ2-M)/σ2 in sequencing data?
The slocal denotes a shape parameter. In NB model, the s should be the number of successful events. Again, in the sequencing data, what is a successful event? Why do you assign M2/(σ2-M) to p? How to understand M2/(σ2-M) in sequencing data?