tlambert03 / nd2

Full-featured nd2 (Nikon NIS Elements) file reader for python. Outputs to numpy, dask, and xarray. Exhaustive metadata extraction
https://tlambert03.github.io/nd2
BSD 3-Clause "New" or "Revised" License
53 stars 15 forks source link

Hidden issues with DaskArrayProxy no longer hidden: fails to work with NEP18 dispatch mechanism, np forces compute #25

Closed VolkerH closed 2 years ago

VolkerH commented 2 years ago

Description

In this https://github.com/tlambert03/nd2/issues/19#issuecomment-965182850, @tlambert03 wrote:

that is, it's a dask array that, whenever you try to call compute or np.asarray, will re-open the underlying file (with self.wrapped.ctx is essentially just with ND2File()....)

It looks a tad bit risky at first, but I haven't run into any issues with it yet. In any case, I suspect the issue of trying to use a dask array after closing the file is far more common than whatever hidden issues there are with this proxy. I'm inclined to try it

The hidden issues are coming out of hiding. Where the NEP-18 mechanism would dispatch the dask array method corresponding to a numpy method when passing a dask array to the numpy method, this no longer works with the DaskArrayProxy. This triggers a compute() on the array underlying the proxy where no compute() would have happened on a non-proxied array. In my case (large array) that kills the Linux kernel.

To reproduce (here I use a 4d nd2-file):

test_nd2.py

from nd2 import ND2File

import numpy as np
import dask.array as da
dataset_nd2 = "/home/hilsenst/Documents/Luisa_Reference_HT/PreMaldi/Seq0000.nd2"

def test_nd2_dask_einsum():
    f = ND2File(dataset_nd2)
    arr = f.to_dask()
    print(f"Array shape {arr.shape}")
    reordered_dask = da.einsum('abcd->abcd', arr)
    print(reordered_dask[:1,:1,:1,:1].compute())

def test_synthetic_dask_einsum_via_nep18():
    arr = da.zeros([1000,1000,100,100])
    print(f"Array shape {arr.shape}")
    reordered_nep18 = np.einsum('abcd->abcd', arr)
    print(type(reordered_nep18))
    print(reordered_nep18[:1,:1,:1,:1].compute())

def test_nd2_dask_einsum_via_nep18_small():
    f = ND2File(dataset_nd2)
    arr = f.to_dask()
    arr = arr[:10,:10,:10,:10]
    print(f"Array shape {arr.shape}")
    print(f"arr has type {type(arr)}")
    reordered_nep18 = np.einsum('abcd->abcd', arr)
    print(type(reordered_nep18))
    print(reordered_nep18[:1,:1,:1,:1].compute())

def test_nd2_dask_einsum_via_nep18():
    f = ND2File(dataset_nd2)
    arr = f.to_dask()
    print(f"Array shape {arr.shape}")
    reordered_nep18 = np.einsum('abcd->abcd', arr)
    print(type(reordered_nep18))
    print(reordered_nep18[:1,:1,:1,:1].compute())

Running these tests shows the problem

(napari_latest) hilsenst@itservices-XPS-15-9500:~/GitlabEMBL/spacem-ht/src/spacem-mosaic$ pytest tests/test_nd2.py  --capture=no
=========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.9.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
PyQt5 5.15.4 -- Qt runtime 5.15.2 -- Qt compiled 5.15.2
rootdir: /home/hilsenst/GitlabEMBL/spacem-ht/src/spacem-mosaic
plugins: order-1.0.0, napari-0.4.11, timeout-1.4.2, anyio-3.3.0, napari-plugin-engine-0.1.9, qt-4.0.2, hypothesis-6.14.4
collected 4 items                                                                                                                                                                                         

tests/test_nd2.py Array shape (734, 2, 2060, 2044)
[[[[96]]]]
.Array shape (1000, 1000, 100, 100)
<class 'dask.array.core.Array'>
[[[[0.]]]]
.Array shape (10, 2, 10, 10)
arr has type <class 'nd2._dask_proxy.DaskArrayProxy'>
<class 'numpy.ndarray'>
FArray shape (734, 2, 2060, 2044)
Killed

For me, the convenience of using NEP-18 dispatch almost outweighs the problem of a few open file handles without the array proxy. I guess the chances to get numpy to support ObjectProxies with NEP18 as well are fairly slim.

tlambert03 commented 2 years ago

Challenge accepted! :). Thanks this is exactly what I was looking to find. Your tests are very helpful. Worse case scenario I can make returning the proxy an optional parameter

VolkerH commented 2 years ago

Worse case scenario I can make returning the proxy an optional parameter

I think that may be useful in any scenario.

tlambert03 commented 2 years ago

I think i've got a good solution, will push soon

tlambert03 commented 2 years ago

I think #26 fixes this. but of course, feel free to reopen if you find more numpy incompatibilities!