Closed Mikejmnez closed 1 year ago
Merging #387 (7ba13f0) into main (4eaa06e) will increase coverage by
0.05%
. The diff coverage is75.00%
.
@@ Coverage Diff @@
## main #387 +/- ##
==========================================
+ Coverage 94.10% 94.15% +0.05%
==========================================
Files 9 9
Lines 4189 4194 +5
Branches 1005 1004 -1
==========================================
+ Hits 3942 3949 +7
Misses 154 154
+ Partials 93 91 -2
Flag | Coverage Δ | |
---|---|---|
unittests | 94.15% <75.00%> (+0.05%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Files Changed | Coverage Δ | |
---|---|---|
oceanspy/subsample.py | 92.62% <ø> (ø) |
|
oceanspy/llc_rearrange.py | 84.77% <50.00%> (+0.25%) |
:arrow_up: |
oceanspy/_ospy_utils.py | 98.24% <100.00%> (+0.05%) |
:arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
why is chunks removed?
why is chunks removed?
Reasons:
1) It is very virtually impossible to know a priori
the size of the resulting dataset after an arbitrary cutout, in particular the length of core dimensions (X
, Y
, etc)
2) Because of 1), it is essentially impossible to know a priori
whether a "proposed" chunking strategy (passed as argument to llc_rearrange
) will create equal, consistent chunks across the dataset. A consistent chunking strategy creates equal chunks / divides the length of a core dimension into equal (number of) pieces. Because center and corner points (Xp1 and X) are offset by a unit, this resorts to finding the largest common divisor (because we want large chunks), or close to equal chunk sizes so that we always have the same number of chunks ("Nx"), each with the same number of grid points.
3) Performing an arbitrary cutout can sometimes create inconsistent chunking along core dimensions. There is no immediate error associated with this, but it can creep later when making computations (not when plotting). This means that the resulting dataset after cutout should always be rechunked.
4) Having inconsistent chunking can negatively effect the performance of subsequent operations, and lead to errors. I can come up with an explicit example if needed.
And so the only option that is guaranteed to produce consistent chunking after a cutout via llc_rearrange
, is by having a single chunk of the size of the core dimension (e.g. X
, 'Xp1', 'Y', 'Yp1', ...). Any subsequent re chunking can be done afterwards, after inspecting the dataset. Having chunks
as arguments will very likely lead to errors.
Looks good to me. Do these changes work OK with
ECCO
anddaily_ecco
LLC datasets too?
Yes. The changes are independent of the size of the LLC dataset.
@asiddi24 @MaceKuailv @malmans2 @renskegelderloos Does this look OK to you? If so, please approve the PR. Thanks!
This PR addresses #386 .
Improves the performance of oceanspy with LLC4320 by avoiding the computation of max/min range values from the dataset, as these are known a priori.
Feel free to suggest something different, not approve, etc...
This PR also makes sure that the chunk size in the horizontal of the resulting dataset from cutout, is set by the domain size. Before it was an option, with the idea for testing. But we can always re chunk afterwards. This eliminates an unused parameter from llc_rearrange.