icesat2py / icepyx

Python tools for obtaining and working with ICESat-2 data
https://icepyx.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
213 stars 107 forks source link

IndexError: array index out of range #402

Open ravindraK08 opened 1 year ago

ravindraK08 commented 1 year ago

I am getting an Index error while trying to filter the data based on a polygon extent.

short_name = 'ATL13' spatial_extent = [86.59494887372189, 27.77205176278113, 86.59494887372189, 27.78256576421275, 86.62486847699424, 27.78256576421275, 86.62486847699424, 27.77205176278113, 86.59494887372189, 27.77205176278113] date_range = ['2018-12-22','2022-10-08']

region_a = ipx.Query(short_name, spatial_extent, date_range)

region_a.avail_granules()


AttributeError Traceback (most recent call last) File ~\anaconda3\lib\site-packages\icepyx\core\query.py:955, in Query.avail_granules(self, ids, cycles, tracks, cloud) 954 try: --> 955 self.granules.avail 956 except AttributeError:

AttributeError: 'Granules' object has no attribute 'avail'

During handling of the above exception, another exception occurred:

IndexError Traceback (most recent call last) Input In [100], in <cell line: 8>() 4 date_range = ['2018-12-22','2022-10-08'] 6 region_a = ipx.Query(short_name, spatial_extent, date_range) ----> 8 region_a.avail_granules()

File ~\anaconda3\lib\site-packages\icepyx\core\query.py:957, in Query.avail_granules(self, ids, cycles, tracks, cloud) 955 self.granules.avail 956 except AttributeError: --> 957 self.granules.get_avail(self.CMRparams, self.reqparams, cloud=cloud) 959 if ids or cycles or tracks or cloud: 960 # list of outputs in order of ids, cycles, tracks, cloud 961 return granules.gran_IDs( 962 self.granules.avail, 963 ids=ids, (...) 966 cloud=cloud, 967 )

File ~\anaconda3\lib\site-packages\icepyx\core\query.py:529, in Query.CMRparams(self) 522 kwargs["readable_granule_name[]"] = self._readable_granule_name 524 if self._CMRparams.fmted_keys == {}: 525 self._CMRparams.build_params( 526 product=self.product, 527 version=self._version, 528 extent_type=self._spatial._ext_type, --> 529 spatial_extent=self._spatial.fmt_for_CMR(), 530 **kwargs, 531 ) 533 return self._CMRparams.fmted_keys

File ~\anaconda3\lib\site-packages\icepyx\core\spatial.py:640, in Spatial.fmt_for_CMR(self) 638 # Simplify polygon. The larger the tolerance value, the more simplified the polygon. See Bruce Wallin's function to do this 639 poly = poly.simplify(0.05, preserve_topology=False) --> 640 poly = orient(poly, sign=1.0) 642 # Format dictionary to polygon coordinate pairs for API submission 643 polygon = ( 644 ",".join([str(c) for xy in zip(*poly.exterior.coords.xy) for c in xy]) 645 ).split(",")

File ~\anaconda3\lib\site-packages\shapely\geometry\polygon.py:426, in orient(polygon, sign) 424 rings = [] 425 ring = polygon.exterior --> 426 if signed_area(ring)/s >= 0.0: 427 rings.append(ring) 428 else:

File ~\anaconda3\lib\site-packages\shapely\algorithms\cga.py:7, in signed_area(ring) 3 """Return the signed area enclosed by a ring in linear time using the 4 algorithm at: https://web.archive.org/web/20080209143651/http://cgafaq.info:80/wiki/Polygon_Area 5 """ 6 xs, ys = ring.coords.xy ----> 7 xs.append(xs[1]) 8 ys.append(ys[1]) 9 return sum(xs[i]*(ys[i+1]-ys[i-1]) for i in range(1, len(ring.coords)))/2.0

IndexError: array index out of range

JessicaS11 commented 1 year ago

Hello @ravindraK08! Thanks for noting this error and providing the code that generates it. I was able to reproduce the error, which is stemming from the fact that at some point the geometry is becoming empty. I will try to keep working on figuring out if we can implement a fix.

JessicaS11 commented 1 year ago

Hello @ravindraK08. It turns out that I just needed to set the preserve_topology=True flag (it was set to False) so that during polygon simplification (which is done for submissions to CMR when searching for granules) a valid geometry was created.

Are you available to review this pull request?

ravindraK08 commented 1 year ago

Hello @JessicaS11! Really appreciate you taking the time to respond and address the problem. It does work now, does not throw an error, and displays available granules; however, when attempting to order granules(), nothing is downloaded with the message 'no data within the spatial and/or temporal subset constraints', despite the fact that the data is accessible on Openaltimetry.

JessicaS11 commented 1 year ago

however, when attempting to order granules(), nothing is downloaded with the message 'no data within the spatial and/or temporal subset constraints', despite the fact that the data is accessible on Openaltimetry.

This suggests that there is no data for the parameters you've provided for this dataset. Earthdata Search will not let me zoom in enough to see data today, and the first few potential granules I checked in OpenAltimetry did not have data in this region either (and with no way to search for a time range, I'd have to manually click through all 45 potential granule dates). Can you provide a screenshot and/or a specific granule date where you're seeing data in OpenAltimetry so I can look into it?

ravindraK08 commented 1 year ago

Hi @JessicaS11! The data I get when I filter by location on Openaltimetry is shown below in screenshots.

Screenshot 2023-02-15 110401 Screenshot 2023-02-15 110631 Screenshot 2023-02-15 110721 Screenshot 2023-02-15 110818 Screenshot 2023-02-15 110850 Screenshot 2023-02-15 110926 Screenshot 2023-02-15 111054 Screenshot 2023-02-15 111119

JessicaS11 commented 1 year ago

Thank you @ravindraK08! From looking into these results, none of the granules/data shown here are listed as possible granules in the list from NSIDC, which means the Earthdata Search system (which uses metadata to identify possible granules) is not finding the data. So your region (which is relatively small and near the equator) is encountering a known issue for small polygons submitted to NSIDC. Some more details on this are starting here in this issue discussion. Ideally this should be addressed on the data management side. Other users have had success with submitting a larger bounding box when ordering data, and I will look into adding a warning or implementing a workaround that follows SlideRule's efforts on this.

ravindraK08 commented 1 year ago

Thank you @JessicaS11! I am actually working on a large area. However, NSIDC searches do not include all granules, even when a bounding box for the entire region is submitted. Due to this issue, I am missing a significant amount of data even when submitting on a regional scale. Would you suggest any other way out?

JessicaS11 commented 1 year ago

@ravindraK08 I was hoping to have "another way out" for you by searching for granules based on the specific tracks you need (which can easily be done in icepyx). However, in testing an example to share here, I discovered that there are NO ATL13 granules returned for track 1308, one of the track IDs indicated as having data in your region from your OpenAltimetry screen shots.

The issue turns out to be analogous to a similar one I tried to address a few weeks ago, for which @mikala-nsidc at NSIDC and I were unable to find a resolution. I've reached out to OpenAltimetry and linked them to this issue. Hopefully they can provide some feedback and help us find a resolution. Otherwise, unfortunately, I am at a loss for how to proceed because - icepyx aside - I can't come up with a way to have the correct granules returned as search results by Earthdata Search.

mikala-nsidc commented 1 year ago

@ravindraK08 Hello! In order to get good "hits" from CMR and you'll need to make sure your bounding box crosses both RGT 599 and RGT 1308. So you'll need to make the longitudes of your bounding box wider than currently. I suggest something like 86.50 to 86.70. Similarly for latitudes, I might try something like 27.74 to 27.81. The NSIDC subsetter will only return data if there are sufficient data within the bounding box for those granules that CMR returned as intersecting your bounding box.

mikala-nsidc commented 1 year ago

@ravindraK08 Can you give me a little more information about this: However, NSIDC searches do not include all granules, even when a bounding box for the entire region is submitted.?
Is this because you are not seeing granules with 0599 and 1308 in the filename? ATL13 is a unique data product in that there are actually 4 tracks worth of data in a single file, but only the first of those tracks appear in the filename. So for instance, the granule ATL13_20210823233312_09371201_005_01.h5 contains data for tracks 937, 938, 939, 940.

mikala-nsidc commented 1 year ago

image (1)

mikala-nsidc commented 1 year ago

Track 599 with data on 2022-05-04 (04 May 2022) is contained within granule ATL13_20200504145527_05970701_005_01.h5. Working in OpenAltimetry to find the data in your screenshots above, I was able to get the csv from OA for the bounding box I recommended above. I couldn't get the subsetted HDF5 file, however. I'll check in with our operations team about that.
If the csv doesn't contain the data that you need, a clunky and painful work around would be to download the csv in order to identify the granule you need and then download that whole granule from NSIDC and then subset.

I tried the larger bounding box I recommended above in Earthdata Search and I'm not getting good returns either. Unfortunately the spatial solution for the granule footprints in CMR (and therefore Earthdata Search) is based upon the location of the estimated nadir trace of the orbit path of the satellite. This makes it really "rough" and in order to grab granule ATL13_20200504145527_05970701_005_01.h5, I had to make the width of the bounding box 87 to 85 degrees wide in longitude. I realize this is pretty impractical. My best advice at the moment is to use Open Altimetry to scroll through dates with data, like you have above, then download the csv from a generous region over your area, and then retrieve the granule IDs from the csv. You may have to download full granules and subset manually. Apologies this isn't ideal at the moment! NSIDC doesn't control the spatial solution of the granules - they come "as is" and so it's tricky to combine that with search and subset capabilities at the moment. They are working on an improvement but it could be a while.

JessicaS11 commented 1 year ago

Thanks for helping with this, @mikala-nsidc!

@ravindraK08, unfortunately it looks like right now there is not a great workaround for this. Building on Mikala's notes, I found that if I created a query object using tracks (either by submitting tracks=["599","598","597","596"] or figuring out a priori which track ID was in the granule filenames and submitting just that one, e.g. tracks = ["597"], I was able to at least get all of the granules you know (from OA) have data for your region to be returned in the list of available granules. However, I was unable to find a way to submit a valid order, even with region.order_granules(subset=False) that was not rejected as having no data by NSIDC. You might be able to get around this by combining this approach specifying tracks with a larger bounding box (so you end up with like 16 potential granules instead of ~50), but in the process of testing this I found a bug in the ordering step (which I am going to try and fix, but I have a few other dev deadlines that need to be prioritized right now - if you're interested/able we'd love for you to submit a fix!).

None of these are elegant solutions, but I will note that you can create multiple Query objects with refined search parameters, and I know folks have searched for specific granules by looping through all their sets of inputs and using the .netrc or environment variable authentication to place and download the orders. Please let me know if there's anything I can do to help or if you have ideas we've missed.