Closed sergpolly closed 4 years ago
my dearest Sergey, did you truly believe I didn't know what list slices return. You must think so little of me, tsk tsk. The idea behind not having symmetric bins, was to use a 6Mb window in our 40kb data. There are 150 40kb bins in a 6Mb window. You are correct, we could have calculated cis percent for bin i using the 3Mb (75 bins) upstream of i and the 3MB (75 bins) downstream of i; which would have spanned a window of 6.04 Mb (including bin i). However, I thought it was cleaner aesthetically to be using a window that spanned precisely 6Mb of interactions. In our 40kb data this led to the asymmetry you refer to above with one less bin downstream of bin i then upstream. I agree for future applications where the range need not be 6Mb it makes more sense to have the symmetry you refer to. So I would definitely go with changing this in updated cooler compatible scripts ;)
@tborrman this is a public thing ... it's more for others to read, and for us to remember the thoughts that went into the metrics that were used . I didn't even remember we've already discussed it a year ago - so that's why we are documenting it now. Understood about the 6MB and the bins that went into it - symmetric case is probably more useful for future, as the pixels are not equal because of distance decay .
hey @tborrman - this ain't no "symmetric" sum of cis signal withing the range ... https://github.com/tborrman/liquid-chromatin-Hi-C/blob/9199d9d43d98059037435b42cc9c2ebcfb676291/src/liquid_chromatin_HiC/matrix_functions.py#L291
in numpy (and python lists as well) when slicing an array , the last element is not included, i.e.
a[2:4]
- would give youa[2]
anda[3]
, thus when you slide your hi-c heatmap like soobs[i-dist:i+dist]
, the last included element is going to bei+dist-1
, which makes the downstream summation 1 pixel "smaller" than upstream:upstream + diag + downstream = obs[i-dist:i] + obs[i] + obs[i+1:i+dist]
; i.e. size of upstream = size of downstream + 1 ...another illustration of the same thing, assuming
i=5
anddist=5
:I don't think it is a big deal for the paper, but it's probably worth changing it in the scripts ... - let me know what you think