pynapple-org / pynapple

PYthon Neural Analysis Package :pineapple:
https://pynapple-org.github.io/pynapple/
MIT License
243 stars 59 forks source link

Make tsd, tsd-tensor, and tsd-frame lazy #264

Closed alejoe91 closed 1 month ago

alejoe91 commented 2 months ago

Fixes #245

@gviejo @BalzaniEdoardo I just removed the [:] when loading arrays from the NWB file and added a lazy flag to the BaseTsd to allow to skip the convert_to_numpy. Note that this works only for array-like, but both h5py.Dataset and zarr.Array support this. It seems to work well on my side! Let me know what you think!

alejoe91 commented 2 months ago

This seems to break the jitcontinuous_perievent...would yiou be fine in having a parallelized version that doesn't rely on numba, but on the concurrent.futures.ProcessPoolExecutor? This will enable to compute continuous perievents without loading the entire LFP traces in memory

coveralls commented 2 months ago

Pull Request Test Coverage Report for Build 9179520861

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
pynapple/core/_core_functions.py 1 2 50.0%
pynapple/core/time_series.py 12 13 92.31%
pynapple/io/interface_nwb.py 5 8 62.5%
<!-- Total: 18 23 78.26% -->
Totals Coverage Status
Change from base Build 9117465489: -0.02%
Covered Lines: 2436
Relevant Lines: 2851

💛 - Coveralls
alejoe91 commented 2 months ago

Seems like a coverall issue? Tests are passing locally on my side

gviejo commented 2 months ago

Ok really weird error with coveralls. I have another PR where coveralls runs fine.

alejoe91 commented 2 months ago

I noticed that all the PRs so far has been from you and @BalzaniEdoardo , so maybe some of the actions are relying on some secrets that my fork doesn't have access to

BalzaniEdoardo commented 2 months ago

Hi Alessio,

I checked out your PR. It's very cool that the lazy-loading is working. On my side test passes, I'll look more into what is going on with the coveralls. It would be nice to add tests to test_nwb.py that checks the data in the Tsd is lazy-loaded and that you can run pynapple calls once you slice into the data, something like this,


def test_is_lazy_loaded(path):
    tsd_lazy = nap.load_file(path)
    assert isinstance(tsd_lazy.d, h5py.Dataset)
    assert isinstance(tsd_lazy.t, np.ndarray)

def test_restrict_lazy_loading(tsd_lazy, tsd_regular, epoch):
      assert all(tsd_lazy[:1000].restrict(epoch).t == tsd_regular[:1000].restrict(epoch).t)
      assert np.all(tsd_lazy[:1000].restrict(epoch).d == tsd_regular[:1000].restrict(epoch).d)
      assert np.all(tsd_lazy[:1000].restrict(epoch).time_support.values == tsd_regular[:1000].restrict(epoch).time_support.values)

It would be very nice to have a version of this in which restrict for selecting based on time would work without the slicing for reading the data first. This would allow you to slice based on time instead of having to calculate the indices.

tsd_regular = tsd_lazy.restrict(epochs)

That requires way more work, but it is not technically hard. All you need is splitting the jitrestrict function in two, one that receives time_array, starts and ends and returns the indices for slicing, and a regular function that applies the slicing only.

# in _jitted_functions.py
@jit(no_python=True)
def jit_compute_indices(time_array, starts, ends)
    .... # calculate the ix indices for slicing as in the current jitrestrict
    return ix

def jitresrict(time_array, data_array, starts, ends):
    ix = jit_compute_indices(time_array, starts, ends):
    return time_array[ix], data_array[ix]

I wonder what @gviejo thinks about this idea. We should compare the performance but my guess is that this new jitrestrict will be still efficient because the last slicing just creates a view when the data_array is loaded.

gviejo commented 2 months ago

I can see a path for this but it will take some time since it will requires to update all the jitted functions. Right now, we are focusing on the integration with pynajax so there will be some changes in the core. We can ping you on this PR or in the issue #245 when we can tackle this.

BalzaniEdoardo commented 1 month ago

@alejoe91 thanks for the PR Ale! I added a bunch of tests and loaded the "time" array (only the data is lazy loaded).

This little change allows you to treat your lazy loaded array as if it was fully loaded; If you want or load a chunk of your data only, you should call restrict or get before any computation. The call to restrict will create a vector of indices using the loaded time array, and use it to slice the hdf5.Dataset which will load only what you need. We will be merging into dev so far, test it out and tell us how it goes!

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 73.68421% with 10 lines in your changes are missing coverage. Please review.

Files Coverage Δ
pynapple/core/_core_functions.py 85.91% <50.00%> (ø)
pynapple/core/time_series.py 91.90% <84.61%> (ø)
pynapple/io/interface_nwb.py 75.40% <69.56%> (ø)
gviejo commented 1 month ago

Thank you @alejoe91 . This should appear soon in 0.6.6.