Closed Mikejmnez closed 1 year ago
isn't that true for ECCO too which has a 'face' dimension? I would assume it would not really make a difference for that size of a dataset, though. Also, with ECCO it seems that the chunksize has not really caused any issues as of yet with the cutout, considering how it is chunked at the moment. I can see though, with LLC4320 it would create a huge speed up. This looks like a great catch to increase compute speeds!
Yes, the ECCO dataset has very little sensitivity in terms of performance, to this check. LLC4320 with chunks1 (above), which is the original chunking (dataset accessible through sciserver) also is not as sensitive (because chunks are large) but chunk2
above, which has much better performance on the ceph cluster (when allowing full depth per chunk), is very sensitive to this (unnecessary) evaluation, as expected since there are more horizontal smaller chunks.
closed by pr #387
The following evaluations at the beginning of
cutout
with LLC4320 could be circumvented:where
XRange_in, YRange_in
are input arguments into cutout. These two lines are necessary checks in most datasets, but are really unnecessary for datasets like ECCO, LLC4320 and even DYAMOND because their range is the entire world.In particular, I found that the evaluations are sensitive to chunksize. For example consider the following chunksize strategies:
I timed the evaluations with LLC4320 data within Sciserver and I get the following results:
Chunk1:
Chunk2:
solution:
We can define a parameter within the intake catalog that contains the (pre computed) max and min values of
XG
andYG
Ranges (max and min) and have these be defined withinod.parameters
.Then allow
_check_range
(which takes theod
as argument) tpo check if these are predefined. If not the compute them. The relevant change would be somewhere here:Later the max/min values are
An alternative is to check, when 'face' is a dimension of the dataset:
But this conditional may not be appropriate for other datasets with 'face' as a dimension (e.g. ASTE)
Depending on chunk size, circumventing
maxcheck
andmincheck
calculation from the grid is a speed up of up to 1 minute in LLC4320