robinweide / GENOVA

GENome Organisation Visual Analytics
GNU General Public License v3.0
70 stars 15 forks source link

PE-SCAn shifted background standard deviation #357

Closed Lucas446 closed 3 months ago

Lucas446 commented 4 months ago

Hi,

I would like to know how many circular permutations is performed during the estimation of the background around each contacts of interest in PE-SCAn ?

Also, would it be possible to retrieve the data from those permutations (not just the average of the permutation) ? I would like to have an idea of the standard deviation of the background.

Thanks a lot! :),

Best, Tanguy

robinweide commented 4 months ago

Dear Tanguy,

The documentation writes:

An integer of length 1 indicating how many basepairs the anchors should be shifted. Essentially performs circular permutation of size for a reasonable estimate of background. The argument is ignored when shift <= 0.

In other words, the data is shifted a certain number of basepairs once to get the background-distribution (default: shift = 1e6). The shifted slot is thus the average of all 1Mb-shifted matrices and the shifted_raw slot contains the per-region matrices. This is akin to a circular permutation, but much less computationally heavy.

We have played with the idea of doing multiple shifts, but since PE-SCAn averages over many regions, regression to the mean happened and we saw no large differences. If your region-set is small and/or you have extreme-resolution data, it might be different. In that case, I would experiment with running PE-SCAn multiple times with different shifts and averaging the matrices in the shifted slot of the output-objects.

Best,

Robin

Lucas446 commented 3 months ago

Thank you so much !