tlambert03 / nd2

Full-featured nd2 (Nikon NIS Elements) file reader for python. Outputs to numpy, dask, and xarray. Exhaustive metadata extraction
https://tlambert03.github.io/nd2
BSD 3-Clause "New" or "Revised" License
54 stars 15 forks source link

ValueError for the big 2d image in the get_frame method #83

Closed GinYoshida closed 2 years ago

GinYoshida commented 2 years ago

Description

I would like to slice the big image data.

Code

dask_array = nd2.imread(file_path, dask=True)
dask_array =dask_array[0,0:100,0:100,:]
result_ndarray = dask_array.compute()

Error

  File "C:\{my environment}\.venv\lib\site-packages\nd2\nd2file.py", line 510, in _get_frame
    frame.shape = self._raw_frame_shape
ValueError: cannot reshape array of size 50059620352 into shape (26420,37152,1,3)

What I Did

Tried to compute the following nd2 file.

The size information

Attributes(bitsPerComponentInMemory=8, bitsPerComponentSignificant=8, componentCount=3, heightPx=26420, pixelDataType='unsigned', sequenceCount=17, widthBytes=111456, widthPx=37152, compressionLevel=None, compressionType=None, tileHeightPx=None, tileWidthPx=None, channelCount=1)

Note

Another trial

I tried another file and it was working well.

Attributes(bitsPerComponentInMemory=8, bitsPerComponentSignificant=8, componentCount=3, heightPx=5530, pixelDataType='unsigned', 
sequenceCount=16, widthBytes=15984, widthPx=5328, compressionLevel=None, compressionType=None, tileHeightPx=None, tileWidthPx=None, channelCount=1)
5530, 5328, 1, 3

Question

_get_frame in nd2file.py seems to require a big memory if the data is huge in width and height due to converting it to ndarray?

My status

Sorry to day, I'm a beginner at Python. Just using the debugger and running a straightforward script is the maximum that I can do. Please inform me what you would like to make me do some more investigation.

tlambert03 commented 2 years ago

Thanks for the detailed issue @GinYoshida :) very helpful.

without having access to the file itself, I'm not immediately sure 🤔 That number 50059620352 is 17.000092544233993 times the size of the number of elements in shape: (26420,37152,1,3) which is super close to sequenceCount, so my main question is whether this is a somehow corrupt file/frame that we need to handle more gracefully, or something else.

Can you try something for me? use the read_using_sdk flag and let me know if it works for that file

dask_array = nd2.imread(file_path, dask=True, read_using_sdk=True)
dask_array =dask_array[0,0:100,0:100,:]
result_ndarray = dask_array.compute()

_get_frame in nd2file.py seems to require a big memory if the data is huge in width and height due to converting it to ndarray?

yeah, unfortunately, I haven't yet implemented subframe chunking. The SDK doesn't provide it directly (i.e. you must read a full 2D + channels chunk of data before cropping), but it's on the list of things to do. It shouldn't be too hard to do this at the level of the mmap around here. Will add a new issue to track progress on that

GinYoshida commented 2 years ago

@tlambert03 Thank you for your very quick reply.

Conclusion

Using read_using_sdk is working very well. Really appreciate!

nd2.imread(file_path, dask=True, read_using_sdk=True)

File topic

I see your worry. We cannot share the file. it must be very hard for several issues without reproducing the phenomenon on your side. If we find some condition to get this kind of unique data, which is not confidential, we will share it.

Notice the new issue with the memory

I also appreciate your action. I hope the demand is not too low for other people.

tlambert03 commented 2 years ago

hi @GinYoshida, you might give this another try after version 0.4.4 ... I'm not certain if it will fix your issue (when not using read_with_sdk=True without seeing the file itself. but it might?)

Since this issue is hard to tackle without the actual file, and since you have a workaround using the sdk reader, I'm going to close this issue, and see #85 for the sub-frame chunking. Feel free to re-open or comment with additional questions