pyranges / ncls

The Nested Containment List for Python. Basically a static interval-tree that is silly fast for both construction and lookups.
BSD 3-Clause "New" or "Revised" License
213 stars 20 forks source link

Can I use ncls to calculate the intersection/union number between ranges? #2

Open Runsheng opened 6 years ago

Runsheng commented 6 years ago

Is there any method to return the intersection and union between two range in ncls? For instance, range(1,10) and range(5, 15) would return (5,10) and (1,15).

Or just simply return the length of intersection and union like the bedtools jaccards? [https://bedtools.readthedocs.io/en/latest/content/tools/jaccard.html]

endrebak commented 6 years ago

I’ll reply more in depth on monday, when I’m back at work :) Pyranges should be able to do this. The repo is on my github :)

Runsheng commented 6 years ago

Thank you very much! I am now using pybedtools to calculate the intersection between ranges. However, the intersection matrix between 300 mRNA tracks (each contains around 15 ranges) would cost me 400 seconds in a 32 core server. I will try Pyranges first and give you some feedback.

endrebak commented 6 years ago

pyranges is still largely unused. I have passing unittests, but it might still have bugs or not work.

I would also look into this potential error in bedtools jaccard: https://github.com/arq5x/bedtools2/issues/645 Whether it is a bug and whether it matters I dunno' :)

endrebak commented 6 years ago

Also, if you use pybedtools, it is advisable to presort the data first. It is much faster then.