tlambert03 / nd2

Full-featured nd2 (Nikon NIS Elements) file reader for python. Outputs to numpy, dask, and xarray. Exhaustive metadata extraction
https://tlambert03.github.io/nd2
BSD 3-Clause "New" or "Revised" License
52 stars 15 forks source link

Missing axis #190

Closed aaristov closed 6 months ago

aaristov commented 10 months ago

Description

Hi Talley, you package is a real life-saver for our huge datasets. However I've got one dataset which is not showing axes correctly.

What I Did

nd2.ND2File('2023-09-11_MLY003_Cpf1.nd2').sizes
Out[3]: {'T': 49, 'C': 3, 'Y': 2044, 'X': 2048}

In reality there is z axis with 3 planes and Fiji Bioformats importer shows them correctly. The file might be corrupt since the native NIS also bugs on Z, but nevertheless I would like to extract some data if possible. The dataset is quite big (264.8 GB) so I could only share it via my cloud (https://storage.googleapis.com/yeast-problem-nd2/2023-09-11_MLY003_Cpf1.nd2). Please let me know if you download it --- I'll close the link.

Many thanks,

Andrey

tlambert03 commented 10 months ago

Thanks for the report @aaristov. I'm downloading it now.

Hard to say without seeing the metadata, but it does sounds like it might be corrupt. However, there are definitely things we can still do. One "last resort" method is to use nd2.rescue_nd2, which will yield all image planes it finds, and then you'll need to reshape it yourself, as it makes no attempt to align things to expected axis sizes.

Of course, that will be slow and requires digging through the full file (it's more of an "export" type thing). I'll let you know more when I get the full file

aaristov commented 10 months ago

Thanks for the tip! I'm running rescue now and save the chunks to the individual tif files. The data looks good to me.

tlambert03 commented 10 months ago

ok, yeah I can definitely see a bunch more accessible stuff in there. The problem appears to be in the experiment metadata, but i'm still digging into it. Can you tell me what you know about the file and how it was acquired? For example, was it acquired using the NDAcquisition window? any custom stuff going on during acquisition?

I do see all the z planes, but they're in a slightly unusual place. So it's a very useful file for me, but any additional details you have about the acquisition and the expected dimensions/axes would be illuminating (I can also ask the folks at LIM about it)

if you're curious, you can see the raw experiment output at


import nd2

with nd2.ND2File("/Users/talley/Downloads/2023-09-11_MLY003_Cpf1.nd2") as f:
    print(f.experiment)
    print(f._rdr._raw_experiment)
tlambert03 commented 10 months ago

so, to be clear, I don't actually think the file is corrupt. I believe it's all there and intact... but rather using an experiment/axis structure that is apparently unusual enough that even the elements software isn't looking for it.

aaristov commented 10 months ago

Sure, here is the metadata I have.

The dataset was acquired in NDAcquisition dialog using all possible options, so the file has 77 positions, 49 time points, 3 z planes, 3 channels and YX dimensions of 2044 x 2048 px. Empirically I found that reading planes (YXC) with rescue returns the good data. Next come 3 planes, next positions, and at last there are time points. The bizarre fact is that we have plenty of datasets like this but only this one is not recognized correctly by nd2. Literally same kind of dataset from the next day has no problems, despite having even bigger size (480 GB) because of 130 positions used.

Below is my script I used to recover the data yesterday. It executed correctly and extracted the entirety of the planes.

import nd2
import numpy as np
from tifffile import imwrite
from tqdm import tqdm

npos = 77
nframes = 49
nz=3

def rescue():
    with open('2023-09-11_MLY003_Cpf1.nd2', 'rb') as f:
        frames = nd2.rescue_nd2(f, frame_shape=(2044, 2048, 3))
        tpz = []
        pz = []
        z = []

        for frame in tqdm(frames, total=(nframes * npos * nz)):
            if len(pz) == npos:
                tpz.append(pz)
                pz = []
            if len(z) == nz:
                pz.append(z)
                imwrite(f'rescue/T{len(tpz):02d}P{len(pz):02d}.tif', np.array(z).transpose((0,3,1,2)), imagej=True)
                z=[]

            z.append(frame)
        pz.append(z)
        tpz.append(pz)  
        imwrite(f'rescue/T{len(tpz):02d}P{len(pz):02d}.tif', np.array(z).transpose((0,3,1,2)), imagej=True)

rescue()
tlambert03 commented 10 months ago

the key, in the metadata, is that all the positions are nested inside of the time loop as sub-positions. obviously, it's not something that you or the user did intentionally, but it's a structure that even NIS doesn't appear to anticipate happening very much. we'll figure it out and get a fix.

In the meantime, because this isn't a corrupt file (i.e. a file where the actual frames are at unexpected byte offsets in the file) you should be able to grab the data more easily, without using rescue, after #192 merges:

import nd2

with nd2.ND2File('file.nd2') as f:
    for frame in f.attributes.sequenceCount:
        ary = f.read_frame(frame)
        ...
tlambert03 commented 10 months ago

v0.8.1 is pushing to pypi now, after which you can just use read_frame().

tlambert03 commented 10 months ago

The dataset was acquired in NDAcquisition dialog using all possible options, so the file has 77 positions, 49 time points, 3 z planes, 3 channels and YX dimensions of 2044 x 2048 px... The bizarre fact is that we have plenty of datasets like this but only this one is not recognized correctly by nd2. Literally same kind of dataset from the next day has no problems, despite having even bigger size (480 GB) because of 130 positions used.

Thanks for the details. Yeah, it is strange. I just tried all the various ways I could think of to set up an NDAcquisition, and I'm still unable to reproduce a file that has this metadata structure. We'll see what the friendly folks at LIM (authors of elements and nd2) have to say

tlambert03 commented 9 months ago

hey @aaristov, it's becoming a little clearer how a file like this may be generated. It appears that if you pause an ongoing ND Acquisition, modify the location of one of the XY stage positions, then resume the experiment, it may result in a file like this. (I say "may" because while I have been able to create a file with subloops like this, I still can't create one like the file you shared which only has the data there). Can you confirm whether it's possible that this experiment was paused, modified, and resumed?

aaristov commented 9 months ago

hey @aaristov, it's becoming a little clearer how a file like this may be generated. It appears that if you pause an ongoing ND Acquisition, modify the location of one of the XY stage positions, then resume the experiment, it may result in a file like this. (I say "may" because while I have been able to create a file with subloops like this, I still can't create one like the file you shared which only has the data there). Can you confirm whether it's possible that this experiment was paused, modified, and resumed?

Ah yes, this is exactly the case with this dataset as there was a lot of drift and the coordinates were adjusted at some point. I completely forgot about it in the metadata message. How didn't manage to figure this out?