Closed danielfromearth closed 1 year ago
@nlenssen2013, @frankinspace, I've made some edits to the remove_duplicate_dims()
function in this branch in order to allow the subsetter to work with TEMPO Ozone proxy data files, which have duplicated dimensions in two of the variables. This change seems to be working well without the need to hold off variable renaming (with rename_dup_vars()
using xarray
) later in the subset pipeline. I'm a bit unsure whether this will work for all cases though. Are there additional files we can do regression tests with, besides the ones used by the automated tests in the tests/data
directory?
@nlenssen2013, @frankinspace, I've made some edits to the
remove_duplicate_dims()
function in this branch in order to allow the subsetter to work with TEMPO Ozone proxy data files, which have duplicated dimensions in two of the variables. This change seems to be working well without the need to hold off variable renaming (withrename_dup_vars()
usingxarray
) later in the subset pipeline. I'm a bit unsure whether this will work for all cases though. Are there additional files we can do regression tests with, besides the ones used by the automated tests in thetests/data
directory?
To my knowledge, GESDISC only has two collections with this duplicate dimension issue in SNDR and Tropomi. Both of those files get tested in unit test.
Hey all, just checking back on this... @nlenssen2013, @jamesfwood, are you still planning to do additional checks with GESDISC or other data products before we move this from the fork into the base repository? (or perhaps you've already done additional checks?)
Hey all, just checking back on this... @nlenssen2013, @jamesfwood, are you still planning to do additional checks with GESDISC or other data products before we move this from the fork into the base repository? (or perhaps you've already done additional checks?)
Yeah, it checks out with our datasets fine, the second function is still needed for one of our collections so the merged branch into develop is solid
Hey all, just checking back on this... @nlenssen2013, @jamesfwood, are you still planning to do additional checks with GESDISC or other data products before we move this from the fork into the base repository? (or perhaps you've already done additional checks?)
Yeah, it checks out with our datasets fine, the second function is still needed for one of our collections so the merged branch into develop is solid
@nlenssen2013, you mean rename_dup_vars()
is still needed? Just double-checking with you, since the latest version of this branch no longer uses that function at all (actually, I was thinking the function could now be deleted too) — is that okay?
If it's okay, I'll just go ahead and squash+merge this. If not, we might need to make a few more tweaks to use that rename_dup_vars()
function again.
Hey all, just checking back on this... @nlenssen2013, @jamesfwood, are you still planning to do additional checks with GESDISC or other data products before we move this from the fork into the base repository? (or perhaps you've already done additional checks?)
Yeah, it checks out with our datasets fine, the second function is still needed for one of our collections so the merged branch into develop is solid
@nlenssen2013, you mean
rename_dup_vars()
is still needed? Just double-checking with you, since the latest version of this branch no longer uses that function at all (actually, I was thinking the function could now be deleted too) — is that okay?If it's okay, I'll just go ahead and squash+merge this. If not, we might need to make a few more tweaks to use that
rename_dup_vars()
function again.
I'm not getting all the pytest tests/ to pass without it currently, not sure if you are. Not a problem for me if it gets merged though - was able to run an individual subset on that collection and it looks good.
@nlenssen2013. hmmm, I just reran using pytest
, and all of the tests in tests/
passed on my laptop from this branch (danielfromearth:feature/currently-no-issue-duplicated-layer-dim). If they aren't passing for you, are you definitely using the same branch and pulled the latest?
Perhaps we merge this into the podaac:feature branch, and then let the automated test suite — run via GitHub actions — serve as the more authoritative check before merging it into develop
.
@nlenssen2013. hmmm, I just reran using
pytest
, and all of the tests intests/
passed on my laptop from this branch (danielfromearth:feature/currently-no-issue-duplicated-layer-dim). If they aren't passing for you, are you definitely using the same branch and pulled the latest?Perhaps we merge this into the podaac:feature branch, and then let the automated test suite — run via GitHub actions — serve as the more authoritative check before merging it into
develop
.
Yep, I'm fine with that
Github Issue: #142
Description
Modified the
remove_duplicate_dims()
function so that the subsetter pipeline does not fail when encountering the duplicated "layer" dimension in TEMPO ozone profile data (variables:support_data/ozone_averaging_kernel
andsupport_data/ozone_noise_correlation_matrix
)Overview of work done
Added checks in
remove_duplicate_dims()
for whether the Dimension and Variable corresponding to the duplicated dimension already exist in the NetCDF. The function directly writes the new variable with no duplicated dimensions rather than keeping it with an altered name that needs to be renamed later in the subsetting procedure.Overview of verification done
Added a new unit test for TEMPO ozone profile data. Checked that all automated tests passed successfully.
Overview of integration done
Explain how this change was integration tested. Provide screenshots or logs if appropriate. An example of this would be a local Harmony deployment.
PR checklist:
See Pull Request Review Checklist for pointers on reviewing this pull request