tborrman / liquid-chromatin-Hi-C

Quantifying 3D chromatin conformation stability
GNU General Public License v3.0
7 stars 3 forks source link

cis range signal summation is not symmetric by 1 pixel #3

Closed sergpolly closed 4 years ago

sergpolly commented 4 years ago

hey @tborrman - this ain't no "symmetric" sum of cis signal withing the range ... https://github.com/tborrman/liquid-chromatin-Hi-C/blob/9199d9d43d98059037435b42cc9c2ebcfb676291/src/liquid_chromatin_HiC/matrix_functions.py#L291

in numpy (and python lists as well) when slicing an array , the last element is not included, i.e. a[2:4] - would give you a[2] and a[3], thus when you slide your hi-c heatmap like so obs[i-dist:i+dist], the last included element is going to be i+dist-1, which makes the downstream summation 1 pixel "smaller" than upstream: upstream + diag + downstream = obs[i-dist:i] + obs[i] + obs[i+1:i+dist]; i.e. size of upstream = size of downstream + 1 ...

another illustration of the same thing, assuming i=5 and dist=5: MVIMG_20200903_220709

I don't think it is a big deal for the paper, but it's probably worth changing it in the scripts ... - let me know what you think

tborrman commented 4 years ago

my dearest Sergey, did you truly believe I didn't know what list slices return. You must think so little of me, tsk tsk. The idea behind not having symmetric bins, was to use a 6Mb window in our 40kb data. There are 150 40kb bins in a 6Mb window. You are correct, we could have calculated cis percent for bin i using the 3Mb (75 bins) upstream of i and the 3MB (75 bins) downstream of i; which would have spanned a window of 6.04 Mb (including bin i). However, I thought it was cleaner aesthetically to be using a window that spanned precisely 6Mb of interactions. In our 40kb data this led to the asymmetry you refer to above with one less bin downstream of bin i then upstream. I agree for future applications where the range need not be 6Mb it makes more sense to have the symmetry you refer to. So I would definitely go with changing this in updated cooler compatible scripts ;)

sergpolly commented 4 years ago

@tborrman this is a public thing ... it's more for others to read, and for us to remember the thoughts that went into the metrics that were used . I didn't even remember we've already discussed it a year ago - so that's why we are documenting it now. Understood about the 6MB and the bins that went into it - symmetric case is probably more useful for future, as the pixels are not equal because of distance decay .