pyranges / ncls

The Nested Containment List for Python. Basically a static interval-tree that is silly fast for both construction and lookups.
BSD 3-Clause "New" or "Revised" License
213 stars 20 forks source link

Memory error for large datasets #26

Open jf2016GH opened 5 years ago

jf2016GH commented 5 years ago

$time python nclist.py

Subject file: chainOrnAna1.bed (1.956864), Query file: chainRn4.bed (2.350965) Building NCLS took 0.36449146270751953 Traceback (most recent call last): File "nclist.py", line 36, in time_ncls(s, q) File "nclist.py", line 22, in time_ncls n.all_overlaps_both(q[1].values, q[2].values, q.index.values) File "ncls/src/ncls.pyx", line 278, in ncls.src.ncls.NCLS.all_overlaps_both File "ncls/src/ncls.pyx", line 321, in ncls.src.ncls.NCLS.all_overlaps_both File "/home/john/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1320, in resize a = concatenate((a,) * n_copies) MemoryError

endrebak commented 5 years ago

Thanks. This happens when the number of enumerated intervals is very large. I’ll have a look :)

endrebak commented 5 years ago

While I cannot do anything to prevent this from happening, I can write a user-friendly error message instead. I will do that :)

If this means that the NCLS fails on large data where the AIList this is a very interesting result for me :)

jf2016GH commented 5 years ago

AIList for python is updated. It is close to c-code in speed.

On Tuesday, November 12, 2019, 9:44:53 AM EST, Endre Bakken Stovner <notifications@github.com> wrote:  

While I cannot do anything to prevent this from happening, I can write a user-friendly error message instead. I will do that :)

If this means that the NCLS fails on large data where the AIList this is a very interesting result for me :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.