Open image357 opened 4 years ago
@image357 Hi, the problem here is indeed with fuse, for some reason mmap on fuse requires MAP_PRIVATE flag to be passed to mmap call, unfortunately this is not possible to pass this flag through high-level numpy API, but it is possible to create the mmap manually and pass it to ndarray like this:
import numpy as np
import os
import mmap
save_array = np.arange(9).reshape(3,3)
np.save("array.npy", save_array, fix_imports=False)
# works
load_array1 = np.load("array.npy", mmap_mode="c", fix_imports=False)
# works
size = os.path.getsize("array.npy")
with open("array.npy", "r") as f2:
mm = mmap.mmap(f2.fileno(), size, offset=0, flags=mmap.MAP_PRIVATE)
array2 = np.ndarray((3,3), buffer=mm)
print(array2)
Please let us know if this approach is acceptable?
Thanks for your answer and sorry for the late reply. This doesn't work, though. I guess the reason is that .npy files have a specific header that saves dtype, shape and other information. Hence, putting the plain file as the ndarray buffer can't work.
The output of your code is
[[1.87585069e-309 1.17119999e+171 5.93271341e-037]
[8.44740097e+252 2.65141232e+180 9.92152605e+247]
[2.16209968e+233 1.05161974e-153 6.01399921e-154]]
which is not the original array. I also played around using different dtype and order arguments.
Also mmap.MAP_PRIVATE
effectively creates a copy-on-write array which is equivalent to the 'c'
option for np.load(..., mmap_mode='c', ...)
.
I suppose this is something that has to be fixed on the fuse side or might not be fixable at all.
@image357 ok, thanks for the information, unfortunately this looks like this will not be possible through oneclient.
However, we also provide a Python library - OnedataFS - which gives direct access to our filesystem without Fuse. I will try to check if it will work with mmap().
OnedataFS is available by default on oneclient Docker image or can be installed from packages. It implements the PyFilesystem API (https://docs.pyfilesystem.org/en/latest/index.html). Example basic use is as follows:
from fs.onedatafs import OnedataFS
oneprovider_host = "example.com"
oneprovider_token = "ABCD...."
odfs = OnedataFS(oneprovider_host, oneprovider_token)
spaces = odfs.listdir('')
...
Even if mmap doesn't work, please note that each file opened through OnedataFS has an internal memory buffer which will prefetch from the storage only blocks which are requested by IO operations on the handle, so it won't read the entire file into memory if not necessary, so maybe the mmap wouldn't be necessary in your case....
Hey,
I ran into a problem when using oneclient with memory mapping and numpy arrays: Inside a oneclient mount run the following python script:
The error in the last step is:
As far as I could find out, the problem might have something to do with fuse: https://stackoverflow.com/questions/46839807/mmap-no-such-device
Any way to fix this on the oneclient side? E.g. mount options?