xtensor-stack / xtensor

C++ tensors with broadcasting and lazy computing
BSD 3-Clause "New" or "Revised" License
3.34k stars 399 forks source link

[feature] Support mmap'ed npy files #1359

Open jshin47 opened 5 years ago

jshin47 commented 5 years ago

It would be very useful for me if I could use a library like mio with xtensor so I don't have to read in the entire npy file into memory. This would be useful in different parallelism scenarios.

wolfv commented 5 years ago

Hi @jshin47,

we had a PR in xtensor-io that implemented mmap'ing HDF5 files. Non-compressed numpy files would be just as straight forward. You can have a look here: https://github.com/QuantStack/xtensor-io/pull/18/files#diff-4f602a45ff1e0fd1ebc810d7566a0b98R175

I think this shows quite clearly how to do it.

Not sure if we want to add this in xtensor core, though ... but it could also live in xtensor-io.

jshin47 commented 5 years ago

Great, this is very helpful! I can do HDF5 instead of NPY for now. I do think having this kind of support would be really useful in scenarios involving, say, order book data where its quite a bit of data and you need to parallelize.

On Mon, Jan 21, 2019 at 1:22 PM Wolf Vollprecht notifications@github.com wrote:

Hi @jshin47 https://github.com/jshin47,

we had a PR in xtensor-io that implemented mmap'ing HDF5 files. Non-compressed numpy files would be just as straight forward. You can have a look here: https://github.com/QuantStack/xtensor-io/pull/18/files#diff-4f602a45ff1e0fd1ebc810d7566a0b98R175

I think this shows quite clearly how to do it.

Not sure if we want to add this in xtensor core, though ... but it could also live in xtensor-io.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/QuantStack/xtensor/issues/1359#issuecomment-456162106, or mute the thread https://github.com/notifications/unsubscribe-auth/ABphwtluz1dBGS-K1ewx4KWVYyyeLJjkks5vFgV4gaJpZM4aJToe .

JohanMabille commented 5 years ago

Since we support npy format in xtensor core, it would make sense to support the mmap'ed version here too.

wolfv commented 5 years ago

probably we should have a easy way to do mmap_adapt and then we could reuse that function in the npy loader.