open2c / cooltools

The tools for your .cool's
MIT License
140 stars 51 forks source link

option to include track values outside of range/qrange for saddles #45

Closed sergpolly closed 5 years ago

sergpolly commented 6 years ago

it should be quick and easy, the only question is, does it sound like a useful/generic thing for other people or is it too specific - "one off type of thing" to add to the mainline ? @nvictus @golobor ?

sergpolly commented 6 years ago

after some digging, it turned out of course, that: https://github.com/mirnylab/cooltools/blob/74bcfe4f34f7948649e7678f33d31812bca920d7/cooltools/saddle.py#L88 is not the place one has to change to undo trim_outliers thing, but there is a little thing in that line still:

 x = x[(x > 0) & (x < len(binedges) + 1)] 

this (x < len(binedges) + 1) seems redundant , as during digitization there will be up to len(binedges) values, not up to len(binedges) + 1: e.g. for 2 binedges || -> digitized values 0|1|2.

so, if we really want to "trim outliers", this line should be:

 x = x[(x > 0) & (x < len(binedges))] 
sergpolly commented 6 years ago

the number of elements in the hist/count that is returned by the bincount: https://github.com/mirnylab/cooltools/blob/74bcfe4f34f7948649e7678f33d31812bca920d7/cooltools/saddle.py#L89 should be len(binedges)+1, indeed. That is really needed, at least because later on we trim the saddledata matrix and hist/count , assuming they are len(binedges)+1 by len(binedges)+1, and len(binedges)+1 correspondingly: https://github.com/mirnylab/cooltools/blob/74bcfe4f34f7948649e7678f33d31812bca920d7/cooltools/saddle.py#L342

this is actually done without checking if hist/count is indeed of len(binedges)+1 size...

sergpolly commented 5 years ago

looks like a dead end - as it is hard to deal with "half-open" bins @Hbelaghzal found other solutions: