Item creation failures - Githubissues

pjhartzell commented 2 years ago

Seeing some item creation failures. I randomly sampled the Planetary Computer's holdings for 1000 NetCDFs and cataloged the error tracebacks in the attached file: log_summary.txt

It looks like there are some inconsistencies in the NetCDF content in up to 10% of the sampled files. I've bundled the example NetCDF files listed in the log_summary in the attached zip file: example_files.zip. The NetCDF files are also publicly available.

m-mohr commented 2 years ago

Thank you for the report. I'm very sorry about the issues, which did not appear in the limited set of examples that I had access to. I've now included all the examples files in the test suite, fixed the issues as described below and will commit to GitHub shortly. Please see below what the (temporary) solutions look like. We should contact NOAA for clarification. Would you do that or shall I?

KeyError: 'flash_frame_time_offset_of_first_event'

Sometimes it seems the frame times are missing. I'll leave them out of the geoparquet now if they are missing. Still, we should ask NOAA whether that is intentional.

IndexError: index exceeds dimension bounds

This means for some variables that reported count (e.g. in event_count) is not reflecting the number of rows in the variables (e.g. event_id). For now, I'm using the count value by default but if it doesn't match I'm falling back to the number of rows that are actually available and raise a warning. Still, we should ask NOAA whether this is intentional and what we should do in such a case.

TypeError: unsupported type for timedelta seconds component: NoneType

It seems for some time offsets a None/NaN/null value is reported. For now, I'm simply writing "null" to the geoparquet file, too. I'm also adding a short description to the column so that users are aware. Still, we should ask NOAA whether that is intentional and/or has any implications.

KeyError: 'GOES_Test'

Some files report GEOS_Test instead of GOES_West/East. The example file is from before the satellite has "finished drifting"

GOES-S was launched March 1, 2018. GOES-17 began drifting to its GOES-West operational location of 137.2 degrees west longitude on October 24, 2018. Drift was completed on November 13, 2018 and nominal operations resumed on November 15, 2018. The satellite was declared GOES-West on February 12, 2019. The change from 137.0 to 137.2 was made for operational efficiency and to minimize impacts from other geostationary satellites.

Your example is from October 2018. I'm not sure how to handle them as the geometry will likely not be the geometry that we have pre-defined for the final position of both satellites. I've added some fallback code, but that only applies for after the drifting period, so not sure how to handle the files beforehand. All code right now relies on the final positions. Something to ask NOAA about?

TypeError: only integer scalar arrays can be converted to a scalar index

This happens when no events/flashed occured in the timeframe. Will be fixed.

pjhartzell commented 2 years ago

Thanks for the quick updates! Yeah, we definitely could have supplied more examples. But this is pretty standard. Odds and ends turn up once the full dataset starts to be processed.

I'm good with the changes. I agree that the items from when the satellite was drifting (GOES_Test) are an issue. Perhaps a warning about potentially bad geometry in the Item description? Something to discuss with @gadomski later today, along with potential contact with NOAA.

m-mohr commented 2 years ago

Shall I re-release or do we wait for the NOAA feedback?

gadomski commented 2 years ago

Let's wait until we get fixes for all the above issues, then release. If @pjhartzell needs to use this package with only some of the fixes, he can pin to a commit.

pjhartzell commented 2 years ago

@gadomski found some docs relating to the first error: KeyError: 'flash_frame_time_offset_of_first_event'. Evidently there was a version change (the version is not indicated anywhere) that added the three variables containing ..._frame_time_offset_... in their names. Older files will have 45 variables; newer files will have 48.

doc href: https://www.noaasis.noaa.gov/pdf/ps-pvr/goes-16/GLM/Full/GOES16_GLM_FullValidation_ProductPerformanceGuide.pdf

relevant snippet from page 15 of the doc:

How to Determine If Data is DO.07.00.00 or a Prior Version: There are no external indications 
in the file that the format has changed. One way to determine which version of the file you are 
working with is to count the number of variables in the file. If it is 45, it is the OLD format, will not 
have the SUVs (but will still have the “_Unsigned” designation for the non-standard unsigned 
integers). If the variable count is 48, it is the DO.07.00.00 version of the file and will have the list 
of SUVs (above) along with the “_Unsigned” designation for the non-standard unsigned integers.

The table above this snippet in the doc lists the names of the variable that were added in version DO.07.00.00.

parquet.py handles the issue. But not sure if you want to change the way it is handled or update the logger warning now that we have some docs on what is going on.

m-mohr commented 2 years ago

Thanks a lot, yeah I will make a small change and make the tests a bit more precise!

m-mohr commented 2 years ago

I've just made the changes. Any news from NOAA regarding the other points? @gadomski

pjhartzell commented 2 years ago

Also in the doc that @gadomski found are warnings about missing _Unsigned attributes on some variables. This is the cause of the TypeError: unsupported type for timedelta seconds component: NoneType errors. There are unsigned integer bytes being incorrectly read as signed integers. Note that the netCDF4 library honors the _Unsigned attribute if it exists.

We might need to check each variable that is supposed to have an _Unsigned attribute to verify it exists. If it doesn't, some modifications are necessary to obtain the correct values. Here's a gist with a notebook that provides a link to a table that lists which variables should have an _Unsigned attribute and two approaches that I've found to correctly read the variable values when the _Unsigned attribute is missing.

pjhartzell commented 2 years ago

I haven't found anything for the IndexError: index exceeds dimension bounds error. I think your method of overriding the provided count with the actual count when they disagree is good for now.

m-mohr commented 2 years ago

Ah... interesting. I checked the values with Panoply and assumed that what I get out there is actually correct and based my fix on this, which it seems it is not because the file is effectively defect due to the missing unsigned indicator. I'll look into this. I'm wondering whether it should be our responsibility to fix defect files? Users of the netCDF files that read it with "normal software" like Panoply will also get nan values and as such may get different results based on whether they use the "fixed" geoparquet or the "defect" netCDF files. That's somehow weird. (I assume normal users would not read these guide books / PDFs from NOAA.)

pjhartzell commented 2 years ago

I assume normal users would not read these guide books / PDFs from NOAA.

Yeah, I agree that this is a hidden bug and most users will struggle to find it and fix it. @gadomski, your thoughts on repairing the netCDF files? I hate to modify source files, but it definitely creates a disconnect between the corrected data in the parquet files and the defective source file.

m-mohr commented 2 years ago

In principle, we could simply provide an option for this. See #18 for details.

m-mohr commented 2 years ago

So going through the issues again, I think we were able to solve them somewhat and there are two that could be sent to NOAA for clarification. Thanks for the help Pete and Preston.

[x] KeyError: 'flash_frame_time_offset_of_first_event'
[x] IndexError: index exceeds dimension bounds - this is something we could still ask NOAA about.
[x] TypeError: unsupported type for timedelta seconds component: NoneType
[x] KeyError: 'GOES_Test' - it is still open whether another bounding box should be used, but on the other hand the bboxes for the other data is also not very precise and NOAA wasn't able to provide anything better yet.
[x] TypeError: only integer scalar arrays can be converted to a scalar index

pjhartzell commented 2 years ago

Agreed that we can still follow up with NOAA about the index exceeding dimension bounds.

Some thoughts on the drift period, the "GOES-Test orbital_slot, and use of bbox for geometry:

I looked in the data holdings and no netCDF files exist for the drift time periods noted and handled in the code. There is no GOES-16 data in 2017 and there is a gap in the GOES-17 data from October 25, 2018 to November 12, 2018 (inclusive). So that logic could be removed.
All GOES-17 files prior to the drift period use the "GOES-Test" orbital slot and have constant values for the lat_field_of_view_bounds and lon_field_of_view_bounds variables:
```
lat = [ 66.56 -66.56]
lon = [-156.06  -22.94]
```
Since we are using the bbox for the geometry, I think we have what we need to produce Items for the GOES-17 data that is in the GOES-Test orbital slot.
All GOES-16 and GOES-17 files after their respective drift periods use the GOES-East and GOES-West orbital slots, respectively.

m-mohr commented 2 years ago

Thanks for verifying with the data holding that the fallback code is not needed. I removed/changed it to use the platform for identifying the actual slot in case it Goes-Test.

So this issue is basically finished except that we could still ask NOAA about the remaining questions, right?

pjhartzell commented 2 years ago

we could still ask NOAA about the remaining questions, right?

I believe the only remaining question is the "IndexError: index exceeds dimension bounds" question. I'll tag @gadomski on whether that's worth following up with NOAA.

pjhartzell commented 2 years ago

I opened a PR with some edits on how I think the Test orbital slot should be handled. Let me know your thoughts.

I'll review the 0.2.0 release PR once we finish this up.

m-mohr commented 2 years ago

Opened a new issue for the issue that reported counts are not reflecting the number of rows in the variables: #22 So now we can close this issue which got a bit hard to follow.

stactools-packages / goes-glm

Item creation failures #17

KeyError: 'flash_frame_time_offset_of_first_event'

IndexError: index exceeds dimension bounds

TypeError: unsupported type for timedelta seconds component: NoneType

KeyError: 'GOES_Test'

TypeError: only integer scalar arrays can be converted to a scalar index