Open gatoniel opened 2 months ago
Comparing gatoniel:main
(83b185a) with main
(d04df9b)
✅ 13
untouched benchmarks
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 95.03%. Comparing base (
d04df9b
) to head (83b185a
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
great! tests are starting to pass now. I'll take a closer look soon. Were you able to qualitatively confirm that this indeed speeds things up if you read a single subchunk of a massive xy image?
I am currently testing the speed up.
However, the execution time of to_dask
is increasing linearly with the number of chunks. Is that expected?
are you actually calling compute()
there? or simply calling to_dask
? and is that reading the full plane?
If the former (i.e. if you're computing the full plane), then I wouldn't be surprised at all about the time increasing as chunk size decreases. since there are more and more operations to complete. what would be important to verify, though, is that it should take less time to read a single chunk as the chunk size decreases (i.e. just to confirm that it can indeed efficiently access a subset of the data, rather than reading all the data and just cropping it down after the fact)
This is purely executing to_dask
. It's not doing any reading yet. This is the code:
chunks = [
(1024,),
(512,) * 2,
(256,) * 4,
(128,) * 8,
(64,) * 16,
(32,) * 32,
(16,) * 64,
]
chunkstr = [
"(1024,)",
"(512,)*2",
"(256,)*4",
"(128,)*8",
"(64,)*16",
"(32,)*32",
"(16,)*64",
]
file = nd2.ND2File(path)
file.sizes # {'P': 26, 'Z': 1263, 'C': 3, 'Y': 1024, 'X': 1024}
times = []
for c in chunks:
start = timeit.default_timer()
file.to_dask(
frame_chunks=((3,), c, (1024,))
)
times.append(timeit.default_timer()-start)
fig, ax = plt.subplots(1, 1, figsize=[10, 6])
x = [len(c) for c in chunks]
ax.plot(x, times)
ax.set_ylabel("time of `to_dask` in s")
ax.set_xlabel("Chunks in y dimension")
ax.set_xticks(x, chunkstr, rotation=90)
I made some more simple tests. Still not with the dask
feature yet.
%%timeit
with nd2.ND2File(path) as file:
for i in range(500):
new = file.read_frame(i).copy()
692 ms ± 15.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit with nd2.ND2File(path) as file: for i in range(500): new = file.read_frame(i)[:128, :128].copy()
78.8 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
np.ravel_multi_index
go through the first axis%%timeit
with nd2.ND2File(path) as file:
for i in range(file.shape[0]):
j = np.ravel_multi_index((i, 0), file._coord_shape)
new = file.read_frame(j).copy()
90.8 ms ± 339 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit with nd2.ND2File(path) as file: for i in range(file.shape[0]): j = np.ravel_multi_index((i, 0), file._coord_shape) new = file.read_frame(j)[:128, :128].copy()
27.8 ms ± 560 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
np.ravel_multi_index
go through the second axis%%timeit
with nd2.ND2File(path) as file:
for i in range(file.shape[1]):
j = np.ravel_multi_index((0, i), file._coord_shape)
new = file.read_frame(j).copy()
566 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit with nd2.ND2File(path) as file: for i in range(file.shape[1]): j = np.ravel_multi_index((0, i), file._coord_shape) new = file.read_frame(j)[:128, :128].copy()
69.9 ms ± 597 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
The cropping on the read_frame
level works and is always faster. However, jumping back in the file itself (1st and 3rd example vs 2nd example) seems to generate a lof of overhead that drastically reduces the speed differences.
I am going to also compare the subindexing of a region with to_dask
and chunks and with using purely read_frame
to test how much the dask
chunking can speed up things. But I didn't have time today...
Hi, I tried to implement the sub-frame chunking in the
to_dask
and_dask_block
functions as mentioned in https://github.com/tlambert03/nd2/issues/85 I also added some new tests. It might need some more comments. Best, Niklas