google / tensorstore

Library for reading and writing large multi-dimensional arrays.
https://google.github.io/tensorstore/
Other
1.36k stars 120 forks source link

Writing local files fails on Windows 11 #160

Open tindiz opened 6 months ago

tindiz commented 6 months ago

Related to #123.

Writing to filesystem fails on Windows 11.

It is interesting that this spec does not throw the exception, but unfortunately it does not work since no files are created on the filesystem:

spec = {
        'driver': 'zarr',
        'kvstore': {
            'driver': 'file',
            'path': "/tmp_zarr_new",
        }
    }

I also tried with different configurations, but this is a minimal code block that showcases the issue. Please let me know if you need more info.

I am attaching the script with further details:

import tensorstore as ts
from importlib.metadata import version

def run_test_1():
    """Fails!"""
    spec = {
        'driver': 'zarr',
        'kvstore': {
            'driver': 'file',
            'path': "tmp_zarr",
        }
    }
    ts.open(
        spec,
        delete_existing=True,
        create=True,
        dtype=ts.float32,
        shape=[10, 2]
    ).result()

def run_test_2():
    """Similar called the second time without create - works since file was created."""
    spec = {
        'driver': 'zarr',
        'kvstore': {
            'driver': 'file',
            'path': "tmp_zarr",
        }
    }
    ts.open(
        spec,
        dtype=ts.float32,
        shape=[10, 2]
    ).result()

def run_test_3():
    """This somehow works, but nothing is created on filesystem."""
    spec = {
        'driver': 'zarr',
        'kvstore': {
            'driver': 'file',
            'path': "/tmp_zarr_new",
        }
    }
    ts.open(
        spec,
        delete_existing=True,
        create=True,
        dtype=ts.float32,
        shape=[10, 2]
    ).result()

def run_test_4():
    """Same issue with n5 => code from tutorial."""
    dataset = ts.open({
         'driver': 'n5',
         'kvstore': {
             'driver': 'file',
             'path': 'tmp_n5/',
         },
         'metadata': {
             'compression': {
                 'type': 'gzip'
             },
             'dataType': 'uint32',
             'dimensions': [1000, 1000],
             'blockSize': [100, 100],
         },
         'create': True,
         'delete_existing': True,
     }).result()

if __name__ == '__main__':
    print(version('tensorstore'))
    try:
        run_test_1()
        print("Test 1 passed!")
    except Exception as e:
        print(e)

    try:
        run_test_2()
        print("Test 2 passed")
    except Exception as e:
        print(e)

    try:
        run_test_3()
        print("Test 3 passed")
    except Exception as e:
        print(e)

    try:
        run_test_4()
        print("Test 4 passed")
    except Exception as e:
        print(e)

Output:

0.1.59
NOT_FOUND: Error opening "zarr" driver: Error writing local file "tmp_zarr/.zarray": Error getting file info: tmp_zarr/.zarray.__lock [OS error: No such file or directory] [source locations='tensorstore/kvstore/file/file_key_value_store.cc:339\ntensorstore/kvstore/kvstore.cc:373\ntensorstore/driver/driver.cc:117'] [tensorstore_spec='{\"context\":{\"cache_pool\":{},\"data_copy_concurrency\":{},\"file_io_concurrency\":{},\"file_io_sync\":true},\"create\":true,\"delete_existing\":true,\"driver\":\"zarr\",\"dtype\":\"float32\",\"kvstore\":{\"driver\":\"file\",\"path\":\"tmp_zarr/\"},\"schema\":{\"domain\":{\"exclusive_max\":[10,2],\"inclusive_min\":[0,0]}},\"transform\":{\"input_exclusive_max\":[[10],[2]],\"input_inclusive_min\":[0,0]}}']
Test 2 passed
Test 3 passed
NOT_FOUND: Error opening "n5" driver: Error writing local file "tmp_n5/attributes.json": Error getting file info: tmp_n5/attributes.json.__lock [OS error: No such file or directory] [source locations='tensorstore/kvstore/file/file_key_value_store.cc:339\ntensorstore/kvstore/kvstore.cc:373\ntensorstore/driver/driver.cc:117'] [tensorstore_spec='{\"context\":{\"cache_pool\":{},\"data_copy_concurrency\":{},\"file_io_concurrency\":{},\"file_io_sync\":true},\"create\":true,\"delete_existing\":true,\"driver\":\"n5\",\"dtype\":\"uint32\",\"kvstore\":{\"driver\":\"file\",\"path\":\"tmp_n5/\"},\"metadata\":{\"blockSize\":[100,100],\"compression\":{\"level\":-1,\"type\":\"gzip\",\"useZlib\":false},\"dataType\":\"uint32\",\"dimensions\":[1000,1000]},\"transform\":{\"input_exclusive_max\":[[1000],[1000]],\"input_inclusive_min\":[0,0]}}']
laramiel commented 6 months ago

I cannot reproduce this.

I installed python3.11 and copied your test into this file: C:\tmp\x\issue_160.py

Then I ran it like this:

> C:\Python311\python.exe -m pip install tensorstore

> C:\Python311\python.exe issue_160.py
0.1.59
Test 1 passed!
Test 2 passed
Test 3 passed
Test 4 passed

> dir /A /S /B
C:\tmp\x\issue_160.py
C:\tmp\x\tmp_n5
C:\tmp\x\tmp_zarr
C:\tmp\x\tmp_n5\attributes.json
C:\tmp\x\tmp_zarr\.zarray
laramiel commented 6 months ago

It also passes with tensorstore 0.1.60:


C:\tmp\x>C:\Python311\python.exe issue_160.py
0.1.60
Test 1 passed!
Test 2 passed
Test 3 passed
Test 4 passed

C:\tmp\x>dir /a /s /b
C:\tmp\x\issue_160.py
C:\tmp\x\tmp_n5
C:\tmp\x\tmp_zarr
C:\tmp\x\tmp_n5\attributes.json
C:\tmp\x\tmp_zarr\.zarray
laramiel commented 6 months ago

Note: This spec will try to write at the root of wherever you are running. So in my case, running in C:\tmp\x, the output will actually be created in C:\tmp_zarr_new.

    spec = {
        'driver': 'zarr',
        'kvstore': {
            'driver': 'file',
            'path': "/tmp_zarr_new",
        }
    }
Fatal705 commented 3 months ago

I have the same problem writing from inside WSL 2 to a windows drive (/mnt/c/...). Version: v0.1.64 Python: 3.10.12

ValueError: NOT_FOUND: Error opening "n5" driver: Error writing local file "/mnt/c/dataset/ml/attributes.json": [OS error 2: No such file or directory]

File is created on the drive. Opening it afterwards works. Writing to it again fails:

ValueError: NOT_FOUND: Error writing local file "/mnt/c/dataset/ml/0/1/1/0": [OS error 2: No such file or directory]

laramiel commented 3 months ago

That appears to be a different problem, as wsl is actually running Linux against a particular network filesystem. However it probably shouldn't fail.

It appears to be a failure when ::fstat() is called on an open fd on the wsl network filesystem.

laramiel commented 3 months ago

The ::fstat() issue may work if you build from source after https://github.com/google/tensorstore/commit/52c2dda51fb8225de86e5e08277baa97636438f6

I can test on my WSL instance later.

edit: That appears insufficient to solve the problem.