seung-lab / cloud-volume

Read and write Neuroglancer datasets programmatically.
https://twitter.com/thundercloudvol
BSD 3-Clause "New" or "Revised" License
134 stars 47 forks source link

No data written to local folder on Windows 11 #635

Open chourroutm opened 1 month ago

chourroutm commented 1 month ago

Hi, I have come up with a script to write chunks of 256^3 voxels into a precomputed segmentation, but no data is actually written to the disk (which is not full), not even the JSON info file. I am wondering whether it is related to using a Windows (W11) workstation (although they got it working in https://github.com/seung-lab/cloud-volume/issues/618).

This is the script I have:

import tifffile as tiff
from tqdm.notebook import tqdm
from cloudvolume import CloudVolume
from cloudvolume.lib import mkdir
import pathlib
import numpy as np

image_files = pathlib.Path("annotated_data").glob("data_labeled_chunk_*.tif")

first_image = tiff.imread("annotated_data\data_labeled_chunk_44_31_13.tif")
img_shape = first_image.shape
dtype = first_image.dtype

print(f"Dataset shape: {img_shape}")
print(f"Dataset dtype: {dtype}")

output_dir = "./data_ngprec/"
output_dir = pathlib.Path(output_dir)
mkdir(output_dir)

output_dir = output_dir.absolute().as_uri()

print(output_dir)

# Create a CloudVolume object for the Neuroglancer precomputed format
info = CloudVolume.create_new_info(
    num_channels = 1,
    layer_type = 'segmentation', # 'image' or 'segmentation'
    data_type = 'uint8', # can pick any popular uint
    encoding = 'raw', # see: https://github.com/seung-lab/cloud-volume/wiki/Compression-Choices
    resolution = [ 7720, 7720, 7720 ], # X,Y,Z values in nanometers
    voxel_offset = [ 0, 0, 0 ], # values X,Y,Z values in voxels
    chunk_size = [ 256, 256, 256 ], # rechunk of image X,Y,Z in voxels
    volume_size = [18709, 18709, 21517], # X,Y,Z size in voxels
)
vol = CloudVolume(
    output_dir,
    info=info,
    progress=False,
    cache=False
)

vol.commit_info()

print("CloudVolume info:")
print(vol.info)

# Write data to the Neuroglancer precomputed format
for chunk_filename in tqdm(image_files, desc="Converting to Neuroglancer format"):
    chunk_data = tiff.imread(chunk_filename).astype(np.uint8)
    ids = list(map(int,chunk_filename.stem.split("_labeled_chunk_")[1].split("_")))
    print(ids)
    chunk_data = chunk_data[..., np.newaxis]
    # Calculate bounds_inf and bounds_sup together
    bounds = [(id_ * shape, id_ * shape + shape) for id_, shape in zip(ids, chunk_data.shape)]

    # Create slices for the first three dimensions
    slices = [slice(start, stop) for start, stop in bounds]

    # Assign the chunk data to the volume
    vol[slices[2], slices[1], slices[0], 1] = chunk_data

This is the output, which comfirms the files were found:

Dataset shape: (256, 256, 256)
Dataset dtype: uint8
file:///d:/Matthieu/data_ngprec
CloudVolume info:
{'num_channels': 1, 'type': 'segmentation', 'data_type': 'uint8', 'scales': [{'encoding': 'raw', 'chunk_sizes': [[256, 256, 256]], 'key': '7720_7720_7720', 'resolution': [7720, 7720, 7720], 'voxel_offset': [0, 0, 0], 'size': [18709, 18709, 21517]}]}
Converting to Neuroglancer format: 
 11/? [00:01<00:00,  6.47it/s]
[45, 36, 10]
[45, 36, 9]
[45, 37, 10]
[44, 31, 13]
[44, 32, 12]
[44, 32, 13]
[45, 32, 10]
[45, 32, 11]
[45, 32, 12]
[45, 33, 10]
[45, 33, 9]
william-silversmith commented 1 month ago

Huh, I don't have access to a windows machine right now, but the first thing that jumps out at me is that windows paths are \ paths.

I'm a little surprised this works: first_image = tiff.imread("annotated_data\data_labeled_chunk_44_31_13.tif") as the \d is not escaping the backslash \\d.

See if this works?

"file://D:\\Matthieu\\data_ngprec"
chourroutm commented 1 month ago

The same code works on a Linux machine, thus it seems to be related to paths. I can try to investigate on that, and keep the issue open in the meantime.

chourroutm commented 1 month ago

Huh, I don't have access to a windows machine right now, but the first thing that jumps out at me is that windows paths are \ paths.

I'm a little surprised this works: first_image = tiff.imread("annotated_data\data_labeled_chunk_44_31_13.tif") as the \d is not escaping the backslash \\d.

See if this works?

"file://D:\\Matthieu\\data_ngprec"

This did not work, but Windows understands / as a separator in a POSIX URI (from the line output_dir = output_dir.absolute().as_uri()).

chourroutm commented 1 month ago

It turns out the files were written in a strange path: D:\D\Matthieu\... instead of D:\Matthieu\....

For full support on Windows, it might be interesting to rewrite cloudvolume/paths.py with pathlib.Path instead of posixpath. Would you be interested in a PR for this change?

william-silversmith commented 1 month ago

Hi! That is pretty weird. I would appreciate more contributions for windows support! Bear in mind that most CloudVolume use is for e.g. gs:// or s3:// which are posixpaths, so using the OS path type for everything would be detrimental.

chourroutm commented 1 month ago

Ah yes, good point! I'll only tweak the handling of the "local://" paths then