zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
http://zarr.readthedocs.io/
MIT License
1.39k stars 263 forks source link

small 3D-tif-file memmap convert to zarr failed #1487

Open SaibotMagd opened 11 months ago

SaibotMagd commented 11 months ago

Zarr version

'2.16.0'

Numcodecs version

'0.11.0'

Python Version

'3.11.3'

Operating System

Ubuntu 22/ Debian 10 (tried it on 2 systems with 2 Fiji versions)

Installation

"pip install into conda environment"

Description

I tried the smallest possible example:

Error: Message: image

since the folder-size is to small (60.7MB without compression while the original tif-file is 75MB), I think it doesn't save the image correctly.

Steps to reproduce

import dask.array as da
import numpy as np
from dask.diagnostics import ProgressBar
import xarray as xr
import tifffile

file = 'resampled_autofluorescence.tif'
zarr = 'auto.zarr'

mm = tifffile.memmap(file)
arr = da.from_array(mm)
arr.to_zarr(zarr)

imagefile and all sourcecode to be found here: https://ncloud.lin-magdeburg.de/s/tCs5GEw8jBPE64j

joshmoore commented 11 months ago

Hi Tobias. You've created a plain Zarr fileset rather than an OME-Zarr one and therefore it's missing the metadata and layout needed by MoBIE and other NGFF tools.

SaibotMagd commented 11 months ago

Thx, I sucessful saved and open a ome.zarr file based on a very small tif-image (loaded by tifffile.memmap(file)). Now I try to save a more "realistic" example. A 40GB Numpy Array. It slows down my system so I can't do anything else and finally crashed Gnome after 45 minutes. So it doesn't work like that:

import zarr
import os
from os.path import join
import dask.array as da
import numpy as np
from skimage.data import binary_blobs
from ome_zarr.io import parse_url
from ome_zarr.writer import write_image

path = '/home/cni-adult/NFDI-data/tmp'
input = 'binary.npy'
output = 'binary-ome.zarr'
try:
    os.mkdir(join(path, output))
except:
    print(f"{join(path, output)} already exist!")

shape = (5721, 7550, 1025)
mm = np.memmap(join(path, input), shape=shape, mode='r')

store = parse_url(join(path, output), mode="w").store
root = zarr.group(store=store)
write_image(image=mm, group=root, axes="zyx", storage_options=dict(chunks=(shape[0], shape[0], shape[0])))
# optional rendering settings
root.attrs["omero"] = {
    "channels": [{
        "color": "00FFFF",
        "window": {"start": 0, "end": 20},
        "label": "random",
        "active": True,
    }]
}

It seems like the "write_image" function doesn't work in serial mode. So I'm working on a minimal-working example using realistic datasets (the datasets I want to use are 300-400GB).