google / tensorstore

Library for reading and writing large multi-dimensional arrays.
https://google.github.io/tensorstore/
Other
1.32k stars 118 forks source link

Transactional/ACID semantics #150

Open y4n9squared opened 3 months ago

y4n9squared commented 3 months ago

I have a general question in regard to:

and this sentence in the Blog:

Safety of parallel operations when many machines are accessing the same dataset is achieved through the use of optimistic concurrency, which maintains compatibility with diverse underlying storage layers (including Cloud storage platforms, such as GCS, as well as local filesystems) without significantly impacting performance. TensorStore also provides strong ACID guarantees for all individual operations executing within a single runtime.

I created a dummy dataset with the zarr + S3 drivers:

2024-04-04 15:33:22        230 ts/yang-test-dataset/.zarray
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.0
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.1
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.2
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.3
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.4
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.5
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.6
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.7
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.8
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.9

and then created a situation where the next write to chunk 0.0.3 would fail. Running under a transaction

with ts.Transaction() as txn:
    result = ds.with_transaction(txn)[80:82, 99:102, :] = [[[1],[2],[3]], [[4], [5], [6]]]

would throw

Traceback (most recent call last):
  File "/home/yang.yang/workspaces/tensorstore/.yang/foo.py", line 33, in <module>
    with ts.Transaction() as txn:
ValueError: PERMISSION_DENIED: Error writing "ts/yang-test-dataset/0.0.3": HTTP response code: 403 with body: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>SK6BWG5ESTC2NVJ6</RequestId><HostId>5z3/QZmVne5TyFJUH0A0swSAtyyhsl47I/z7AjULiGmsj1QAtf3JEA6d/TAuWH/ts1xCHJmVucM=</HostId></Error> [source locations='tensorstore/kvstore/s3/s3_key_value_store.cc:777\ntensorstore/kvstore/kvstore.cc:373'

but the S3 bucket after this operation looks like this:

2024-04-04 15:33:22        230 ts/yang-test-dataset/.zarray
2024-04-04 17:14:57      48573 ts/yang-test-dataset/0.0.0
2024-04-04 17:14:58      48573 ts/yang-test-dataset/0.0.1
2024-04-04 17:14:58      48573 ts/yang-test-dataset/0.0.2
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.3  <--- not updated
2024-04-04 17:14:57      48573 ts/yang-test-dataset/0.0.4
2024-04-04 17:14:58      48573 ts/yang-test-dataset/0.0.5
2024-04-04 17:14:57      48573 ts/yang-test-dataset/0.0.6
2024-04-04 17:14:58      48573 ts/yang-test-dataset/0.0.7
2024-04-04 17:14:57      48573 ts/yang-test-dataset/0.0.8
2024-04-04 17:14:57      48573 ts/yang-test-dataset/0.0.9

So from the perspective of an observer (who may eventually want to load this dataset again), the operation does not appear to be transactional. So when the blog says transactional with a single runtime, do you mean that the process's view of ds when the context manager exits is transactional, but otherwise make no guarantees about the state of the underlying storage?

If one sets

with ts.Transaction(atomic=True) as txn:
    ...

then if a write would span multiple chunks, I see an error

ValueError: Cannot read/write "ts/yang-test-dataset/.zarray" and read/write "ts/yang-test-dataset/0.0.0" as single atomic transaction [source locations='tensorstore/internal/cache/kvs_backed_cache.h:221\ntensorstore/internal/cache/async_cache.cc:660\ntensorstore/internal/cache/async_cache.h:383\ntensorstore/internal/cache/chunk_cache.cc:438\ntensorstore/internal/grid_partition.cc:246\ntensorstore/internal/grid_partition.cc:246\ntensorstore/internal/grid_partition.cc:246']

I'm guessing this is expected since you have no way of performing a transactional write across multiple S3 objects?

Lastly, on the topic of "optimistic concurrency and compatibility with GCS/other storage layers", since AFAIK S3 does not support conditional PUTs the way that GCS does, is there a possibility of data loss when using S3?

Thanks in advance!

jbms commented 3 months ago

The S3 support was added recently but we indeed need to clarify the limitations in the documentation.

S3 lacks conditional write support and it is indeed possible with multiple concurrent writes to the same object that some writes will be lost.

There is a strategy for implementing atomic writes on S3 under certain assumptions on the timestamps, but it would require a list operation in order to read, which may be costly. When using this strategy with ocdbt, only a single list operation would be needed for the manifest, and subsequent reads (using the cached manifest) would be normal read operations, and multi-key atomic transactions could also be supported (currently a small amount of work remains to actually support both s3 and multi-key atomic operations with ocdbt).