Closed will-moore closed 3 weeks ago
This pull request has been mentioned on Image.sc Forum. There might be relevant details there:
I've pushed the multithreading improvements from @JoOkuma now. Here are the cpu/mem usage for 16 threads:
and 64 threads:
If I use this code (i.e. everything in one transaction):
diff --git a/src/ome2024_ngff_challenge/resave.py b/src/ome2024_ngff_challenge/resave.py
index e4a8fe1..b5ff4e1 100755
--- a/src/ome2024_ngff_challenge/resave.py
+++ b/src/ome2024_ngff_challenge/resave.py
@@ -418,22 +418,19 @@ def convert_array(
# read & write a chunk (or shard) at a time:
blocks = shards if shards is not None else chunks
- for idx, batch in enumerate(Batched(chunk_iter(read.shape, blocks), threads)):
- start = time.time()
- with ts.Transaction() as txn:
- LOGGER.log(5, f"batch {idx:03d}: scheduling transaction size={len(batch)}")
- for slice_tuple in batch:
- write.with_transaction(txn)[slice_tuple] = read[slice_tuple]
- LOGGER.log(
- 5, f"batch {idx:03d}: {slice_tuple} scheduled in transaction"
- )
- LOGGER.log(5, f"batch {idx:03d}: waiting on transaction size={len(batch)}")
- stop = time.time()
- elapsed = stop - start
- avg = float(elapsed) / len(batch)
- LOGGER.debug(
- f"batch {idx:03d}: completed transaction size={len(batch)} in {stop-start:0.2f}s (avg={avg:0.2f})"
- )
+ with ts.Transaction() as txn:
+ for idx, slice_tuple in enumerate(chunk_iter(read.shape, blocks), threads):
+ write.with_transaction(txn)[slice_tuple] = read[slice_tuple]
+ LOGGER.log(
+ 5, f"batch {idx:03d}: {slice_tuple} scheduled in transaction"
+ )
+ LOGGER.log(5, f"batch {idx:03d}: waiting on transaction size={idx}")
+ stop = time.time()
+ elapsed = stop - start
+ avg = float(elapsed) / idx
+ LOGGER.debug(
+ f"batch {idx:03d}: completed transaction size={idx} in {stop-start:0.2f}s (avg={avg:0.2f})"
then this is the usage:
(exited early due to a slight typo in the quick diff above)
Finally, the original code (write.write(read).result()
) had this usage:
The command for all of these is more or less:
time ome2024-ngff-challenge --log=trace 9846151.zarr/0
9846151.zarr-v3 --output-chunks=1,1,1,472,421 --output-
shards=1,1,1,5192,2947 --output-overwrite --output-
threads=16
and the graphs are recorded with:
psrecord 52813 --include-children --plot plot-all.png
Waiting on a quick review from @will-moore then I'll get this out as 0.0.6 and move on to the other PRs.
Thanks @joshmoore - Looks good 👍
This pull request has been mentioned on Image.sc Forum. There might be relevant details there:
https://forum.image.sc/t/ome2024-ngff-challenge-memory-issues-with-tensorstore/100636/7
This PR fixes #28 by writing batches of chunks in parallel (based on a configurable number of threads).
This also fixes an issue I had when working with larger images: e.g. https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0047A/4496763.zarr which has shape
4,25,2048,2048
.I wanted to use
--output-write-details
to generateparameters.json
using a guess for the shard shape, or even providing a shard shape with--output-shards 4,1,2048,2048
. However, in both cases theguess_shards()
is called and fails since the size is above the threshold:This PR aims to only perform this validation IF we're actually using the guessed shard_shape to convert an image, NOT if we're just generating
parameters.json
. Also, we don't validate if the user is doing the conversion withshard_shape
provided by--output-shards 4,1,2048,2048
or by--output-read-details=parameters.json
.To test...