podaac / l2ss-py

Level 2 subsetter with Harmony integration
https://podaac.github.io/l2ss-py/
Apache License 2.0
11 stars 11 forks source link

change conditional array slice from 1st index to most True values #227

Closed danielfromearth closed 7 months ago

danielfromearth commented 7 months ago

Github Issue: #226

Description

This changes the trimming of the conditional array when subsetting both spatially and temporally, so that valid values in the time variable (for TEMPO data) are not inadvertently lost.

Overview of work done

The logic has been changed. Before, the first index of the missing dimension was selected naively. Now, the conditional array is iterated over to find the row/column that has the most True values and that row/column is selected.

Overview of verification done

Tested this manually with TEMPO data locally. Ran and passed all of the automated tests for the subset module.

Overview of integration done

None.

PR checklist:

See Pull Request Review Checklist for pointers on reviewing this pull request

danielfromearth commented 7 months ago

Hi @jamesfwood, looks like there is an issue with dependency versions that Snyk wants addressed. Can you help with addressing this?

jamesfwood commented 7 months ago

Hi @jamesfwood, looks like there is an issue with dependency versions that Snyk wants addressed. Can you help with addressing this?

Hi @danielfromearth, if you pull in the latest develop into your branch and repush it should fix the issue. It updated pillow version.

If this happens again in the future, you can try to do a poetry update and that will update the dependency versions to fix security issues.

danielfromearth commented 7 months ago

Status update: one of the automated tests (tests/test_subset.py::test_cf_decode_times_sndr) just started failing after updating the poetry dependencies. It appears to fail when trying to access values of the xarray.DataArray for a single variable (__local_solar_time) in the SNDR data. The error raised is OverflowError: Python int too large to convert to C long. The variable's attributes are:

{'valid_range': array([ 0., 24.], dtype=float32),
 'long_name': 'local apparent solar time',
 '_FillValue': 9.96921e+36,
 'coverage_content_type': 'referenceInformation',
 'description': 'local apparent solar time in hours from midnight'}

Also note, this error is raised for the granule, SNDR.AQUA.AIRS.20140110T0305.m06.g031.L2_CLIMCAPS_RET.std.v02_39.G.210131015806.nc, but not this granule: SNDR.J1.CRIMSS.20210224T0100.m06.g011.L2_CLIMCAPS_RET.std.v02_28.G.210331064430.nc.

danielfromearth commented 7 months ago

A further update on this failing test: Looks like the /local_solar_time variable — in the tests/data/SNDR/ granule that is raising the OverflowError — contains only/all null values...

snapshot_sndr_aqua_g031

This seems like it might be a problem with this data file, rather than the subsetter. @jamesfwood, @nlenssen2013, @sliu008, thoughts?

danielfromearth commented 7 months ago

Update: Well, the additional fix I just committed — which extends the xarray.open_dataset()'s mask_and_scale application to more time variables (and including float32 as well as float64) — seems to have worked, as it replaces null values with scaled fill values to avoid the Overflow error. So, perhaps we can ignore the null time values in this test data file for now. Nevertheless, the SNDR team may still want to confirm whether those null values in the local_solar_time array are expected.

danielfromearth commented 7 months ago

Thanks @sliu008 for reviewing this too!