Closed will-moore closed 3 weeks ago
This issue has been mentioned on Image.sc Forum. There might be relevant details there:
Hi @will-moore,
From my understanding is that when you do read = ts.open(...).result()
, you're loading the whole thing into memory.
I would check if it works without calling result
. Otherwise, it might need to be copied in chunks.
Thanks, I think you're right - But I realise I made an error there in my summary. We don't actually call read.result()
It's actually:
read = input_config.ts_read()
write = ts.open(write_config).result()
future = write.write(read)
future.result()
I've managed to work around the OOM error in https://github.com/ome/ome2024-ngff-challenge/pull/23/files by reading & writing a block at a time:
# read & write a chunk (or shard) at a time:
blocks = shards if shards is not None else chunks
for slice_tuple in chunk_iter(read.shape, blocks):
LOGGER.debug(f"array_location: {slice_tuple}")
future = write[slice_tuple].write(read[slice_tuple])
future.result()
However, this is kinda slow. So we need to let TensorStore manage this itself (hopefully in parallel), but also limit the memory usage so we don't fail with OOMs.
My assumption was that tensorstore would be able to handle the batching of the reads & writes. @JoOkuma, if you think activating the cache pool will make this possible, happy to give it a try. @will-moore, I've added parallelism to your PR in https://github.com/ome/ome2024-ngff-challenge/pull/23/commits/c23c196dc0ca49be666c4d8fd78cf9cf53a79008
cc: @jbms in case an RFE issue would be of interest
see also: https://forum.image.sc/t/ome2024-ngff-challenge-memory-issues-with-tensorstore/100636
My assumption was that tensorstore would be able to handle the batching of the reads & writes. @JoOkuma,
~I'm not a tensorstore expert, but from my point of view, once you call future.results()
, you synchronize and disable any batch optimization.~
EDIT: Nevermind, I just saw https://github.com/ome/ome2024-ngff-challenge/commit/c23c196dc0ca49be666c4d8fd78cf9cf53a79008 where you delay the results
call.
I couldn't find the PR to comment on https://github.com/ome/ome2024-ngff-challenge/commit/c23c196dc0ca49be666c4d8fd78cf9cf53a79008
I would test if (CURRENT)
for idx, batch in enumerate(batched(chunk_iter(read.shape, blocks), threads)):
futures = []
for slice_tuple in batch:
future = write[slice_tuple].write(read[slice_tuple])
LOGGER.info(f"batch {idx}: {slice_tuple} scheduled -- {future}")
futures.append((slice_tuple, future))
for slice_tuple, future in futures:
future.result()
LOGGER.info(f"batch {idx}: {slice_tuple} completed -- {future}")
is faster as
for idx, batch in enumerate(batched(chunk_iter(read.shape, blocks), threads)):
with ts.Transaction() as txn:
for slice_tuple in batch:
write.with_transaction(txn)[slice_tuple] = read[slice_tuple]
LOGGER.info(f"batch {idx}: {slice_tuple} scheduled in transaction")
EDIT: Nevermind, I just saw c23c196 where you delay the
results
call.
:+1: but note that the original implementation did not call .result()
on the read but passed it in complete to the write. My understanding from @d-v-b though was that tensorstore didn't automatically do the batched reads.
write.with_transaction(txn)[slice_tuple] = read[slice_tuple]
I'll give it a go.
Looking good so far! Thanks. I'll do a bit more testing before merging.
It does leave me to wonder though if:
415 ~ │ with ts.Transaction() as txn:
416 ~ │ for slice_tuple in chunk_iter(read.shape, blocks):
417 ~ │ write.with_transaction(txn)[slice_tuple] = read[slice_tuple]
418 ~ │ LOGGER.info(f"{slice_tuple} scheduled in transaction")
419 ~ │ LOGGER.info("waiting on transaction...")
420 ~ │ LOGGER.info("transaction complete")```
isn't as good or better.
I like it. I wonder if TensorStore manages the memory usage. If that's the case, both for
loops could be wrapped with the transaction.
Tensorstore does not batch writes outside of transactions --- the previous writeback cache support was removed because it was not commonly used and added complexity.
Tensorstore also does not currently limit parallelism to limit memory usage, but we are working on that.
@laramiel
Thanks for the info, @jbms. I've added plots of usage to https://github.com/ome/ome2024-ngff-challenge/pull/23#issuecomment-2298207045 which seem to point out that the strategy outlined above should work well. Not exactly sure of the optimal number of threads
@JoOkuma, here's a plot for the version without transactions:
Takes about twice as long.
This issue has been mentioned on Image.sc Forum. There might be relevant details there:
https://forum.image.sc/t/ome2024-ngff-challenge-memory-issues-with-tensorstore/100636/7
Attempting to convert the image at https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0048A/9846151.zarr/0/ with
Shape [1,3,1402,5192,2947]
and uncompressed 0-resolution size of128.71 GB
...Using branch https://github.com/ome/ome2024-ngff-challenge/pull/23 at https://github.com/ome/ome2024-ngff-challenge/pull/23/commits/f17a6de9638acb1d9493a34576aa7916e0737393
This fails with OOM, e.g. initially tried with 3D chunks/shards:
Same with writing 2D shards/chunks...
We start to see some chunk dirs created e.g.
But then the machine hangs and becomes unresponsive (can't ssh into it anymore etc).
Also tried with local v2 zarr data as input.
The TensorStore code used is essentially:
Follows same pattern as at https://medium.com/@TheHaseebHassan/google-ai-tensorstore-for-array-storage-173326bf5a95
Machine above has 30 GB RAM.