Closed tinyheero closed 3 months ago
Hi @tinyheero . Since version 1.10, the standard deviation used in this equation is optimized as well during the Simulated Annealing. But still a single one for all segments. I'm open for suggestions how to improve. The segments will converge to a very similar value with increasing segment size, so I feel it's probably fine. There is some outlier filtering happening, so small segments where segment log ratio differs from global one due to technical reasons still should not have a dramatic impact.
But that part of the code is unchanged for many years, so I can't fully remember every single decision/benchmarking leading to it.
Thanks @lima1 for your reply.
I agree that it probably doesn't make much difference.
Hi @lima1,
This is more of a question than an issue.
I was re-reading the PureCN paper and reviewing the equation of the original paper (https://scfbm.biomedcentral.com/articles/10.1186/s13029-016-0060-z):
$$ r{i} \sim N \Bigg(log{2} \frac{pC{i} + (1-p)2}{p\Big(\sum{j}l{j}C{j}\Big) / \sum{j}l{j} + (1-p)2}, \sigma_{ri} \Bigg) $$
The standard deviation ($\sigma_{ri}$) of the normal distribution is set to be:
I am just curious as to what the rationale is for using a single standard deviation value (learned across all segments) rather than have the standard deviation set to be segment-centric?