stactools-packages / noaa-mrms-qpe

stactools package for NOAA's Multi-Radar Multi-Sensor Quantitative Precipitation Estimation dataset
Other
2 stars 1 forks source link

COG conversion: Issues with no-data values #3

Closed m-mohr closed 2 years ago

m-mohr commented 2 years ago

Describe the bug When I activate the COG conversion, the no-data values from the GRIB files (-1, -3) are not reflected in the COGs anymore.

Missing value: -1 No coverage: -3 Valid values: >= 0

Expected behavior The main issues here are:

  1. There are two no-data values
  2. The GRIB file doesn't contain details about no-data values and as such GDAL is not aware of them

Possible solutions:

  1. Create two bands (or two files) with a no-data mask
  2. Convert the negative values to a specific no-data value in the COG file and set the no-data value then

I experimented a bit with gdal_edit.py and gdal_calc.py, but did not find a good solution yet.

Example: gdal_calc.py -A MRMS_MultiSensor_QPE_24H_Pass2_00.00_20220530-120000.grib2 --outfile=cog.tif --calc="A*(A>0)" --NoDataValue=0 --format=COG This somewhat works, but also eliminates zeros...

cc @gadomski

m-mohr commented 2 years ago

I've implemented a solution for this issue (including smaller file size), but I'm not 100% sure whether this is a good solution.

I'm basically running this now instead of cogify from stactools:

    call([
        "gdal_calc.py",
        "-A", input_path,
        "--outfile", output_path,
        "--calc", "maximum(A, -1)",
        "--NoDataValue", "-1",
        "--format", "GTIFF", # COG
        "--co", "TILED=YES",
        "--co", "COPY_SRC_OVERVIEWS=YES",
        "--co", "COMPRESS=LZW"
        # "--co", f"TARGET_SRS={reproject_to}"
    ])

Valid values are >= 0 and -1 and -3 are no-data values in the GRIB files. So what I do is simply limit the value range to >= -1 and set the no-data value to -1 with gdal_calc and them generate the COG using gdal_calc, too. I can't use the COG driver for an unknown reason (see below). Is it okay to handle no-data this way?

The other thing is, right now the process is:

  1. extract grib from gzip
  2. reproject and convert to GeoTiff using gdalwarp
  3. convert to COG and do the no-data handling in gdal_calc

I'm wondering whether accessing the files twice is an issue? And/Or Shall I use the cogify function in stactools?

In theory I should also be able do 2+3 in one step by using the COG format and the TARGET_SRS option in gdal_calc, but then I'm getting

ERROR 6: GDALDriver::Create() ... no create method implemented for this format. 'NoneType' object has no attribute 'SetGeoTransform'

which is weird because COG should be able to create files.

gadomski commented 2 years ago

Is it okay to handle no-data this way?

This is a fine approach, but after discussion I think we'd like to split the nodata values into their own "QA/mask" COG as well. So your current approach is fine (though, for float data, can/should we use nan instead of -1 for nodata?), but we should also create a second COG alongside the first, with the -1 and -3 values where present, and nodata (0 or nan probably) elsewhere.

In theory I should also be able do 2+3 in one step by using the COG format and the TARGET_SRS option in gdal_calc, but then I'm getting

ERROR 6: GDALDriver::Create() ... no create method implemented for this format. 'NoneType' object has no attribute 'SetGeoTransform'

which is weird because COG should be able to create files.

Agreed, this is strange, but we've seen the COG driver fall down in other areas before. I'll see if I can take a look.

gadomski commented 2 years ago

@m-mohr I just looked into this a bit more, and I'm not seeing the same issues. Steps:

  1. Download a file from https://mrms.ncep.noaa.gov/data/2D/MultiSensor_QPE_01H_Pass1/
  2. Decompress
  3. gdalinfo -stats MRMS_MultiSensor_QPE_01H_Pass1_00.00_20220607-150000.grib2

Output:

Driver: GRIB/GRIdded Binary (.grb, .grb2)
Files: MRMS_MultiSensor_QPE_01H_Pass1_00.00_20220607-150000.grib2
       MRMS_MultiSensor_QPE_01H_Pass1_00.00_20220607-150000.grib2.aux.xml
Size is 7000, 3500
Coordinate System is:
GEOGCRS["Coordinate System imported from GRIB file",
    DATUM["unnamed",
        ELLIPSOID["Spheroid imported from GRIB file",6378160,298.253916296469,
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433,
            ID["EPSG",9122]]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]]
Data axis to CRS axis mapping: 2,1
Origin = (-129.999999999857124,54.999999999857103)
Pixel Size = (0.009999999714245,-0.009999999714204)
Corner Coordinates:
Upper Left  (-130.0000000,  55.0000000) (130d 0' 0.00"W, 55d 0' 0.00"N)
Lower Left  (-130.0000000,  20.0000010) (130d 0' 0.00"W, 20d 0' 0.00"N)
Upper Right ( -60.0000020,  55.0000000) ( 60d 0' 0.01"W, 55d 0' 0.00"N)
Lower Right ( -60.0000020,  20.0000010) ( 60d 0' 0.01"W, 20d 0' 0.00"N)
Center      ( -95.0000010,  37.5000005) ( 95d 0' 0.00"W, 37d30' 0.00"N)
Band 1 Block=7000x1 Type=Float64, ColorInterp=Undefined
  Description = 0[m] GPML="Specific altitude above mean sea level"
  Min=0.000 Max=74.700 
  Minimum=0.000, Maximum=74.700, Mean=0.081, StdDev=0.816
  Metadata:
    GRIB_COMMENT=Multi-sensor accumulation 1-hour (1-hour latency) [mm]
    GRIB_DISCIPLINE=209
    GRIB_ELEMENT=MultiSensor_QPE_01H_Pass1
    GRIB_FORECAST_SECONDS=0
    GRIB_IDS=CENTER=161(US-OAR) SUBCENTER=0 MASTER_TABLE=255 LOCAL_TABLE=1 SIGNF_REF_TIME=3(Observation_time) REF_TIME=2022-06-07T15:00:00Z PROD_STATUS=2(Research) TYPE=7(Processed_radar_observations)
    GRIB_PDS_PDTN=0
    GRIB_PDS_TEMPLATE_ASSEMBLED_VALUES=6 30 8 0 97 0 0 0 0 102 0 0 255 1 0
    GRIB_PDS_TEMPLATE_NUMBERS=6 30 8 0 97 0 0 0 0 0 0 0 0 102 0 0 0 0 0 255 1 0 0 0 0
    GRIB_REF_TIME=1654614000
    GRIB_SHORT_NAME=0-GPML
    GRIB_UNIT=[mm]
    GRIB_VALID_TIME=1654614000
    STATISTICS_MAXIMUM=74.700004577637
    STATISTICS_MEAN=0.081450768779566
    STATISTICS_MINIMUM=0
    STATISTICS_STDDEV=0.81550677276342
    STATISTICS_VALID_PERCENT=100

There's no nodata value, but there also isn't anything below 0. I wonder if we're looking at the same data?

m-mohr commented 2 years ago

@gadomski Yes, for your example (CONUS) there's no nodata. GUAM has -3 (no coverage) nodata values, for example. The covered area around GUAM has a somewhat circle-ish shape and everything around it is set to -3.

Also, the files never have an actual no-data value reported in the metadata. You can only find these values online in the manual.

gadomski commented 2 years ago

GUAM has -3 (no coverage) nodata values, for example. The covered area around GUAM has a somewhat circle-ish shape and everything around it is set to -3.

Hm, I'm not seeing any 2D QPE data for Guam at all, e.g. https://mrms.ncep.noaa.gov/data/2D/GUAM/MultiSensor_QPE_24H_Pass2/. From this chart (provenance unknown), I'm wondering if we can only expect Alaska, Hawaii, and CONUS: image

Also, the files never have an actual no-data value reported in the metadata. You can only find these values online in the manual.

Based on communications with NOAA folks, these nodata values may be for the "radar only" products, not the Q3MS (which I think we're using). I'm not entirely sure, their naming schemes seem inconsistent.

m-mohr commented 2 years ago

@gadomski Weird, I have been downloading data from there some days ago. I was able to download example data for all 5 areas, there is an example for each area in the test data files. Here's the example for GUAM I downloaded recently: https://github.com/stactools-packages/noaa-mrms-qpe/blob/main/tests/GUAM/MRMS_MultiSensor_QPE_01H_Pass1_00.00_20220601-120000.grib2.gz

Also, it doesn't specify a no-data value limitation/constraint in the online docs so I couldn't know. Thus, tha package is written in a way that it expects these -1 and -3 nodata values generally in any product. If there are limitations we can surely adopt the package accordingly, but then this should be documented by them somewhere (publicly).

gadomski commented 2 years ago

Yup, I can confirm the presence of -3 values in that file (not -1 but maybe there's just none of those). I think I need to do some more reading to figure out the processing streams for these files. Very strange that there wasn't anything in https://mrms.ncep.noaa.gov/data/2D/GUAM/MultiSensor_QPE_24H_Pass2/ ... if you do find data in that or a similar directory, can you let me know? Thanks.

m-mohr commented 2 years ago

@gadomski The CARIB files also have no-data values due to being single-sensor. There I also only find -3 right now, but I don't know for which cases -1 can occur.

I'd assume there's some kind of issue with GUAM right now and they will apprear again eventually, they have been there for two weeks (end of May / beginning of June) when I was working on this package.

gadomski commented 2 years ago

CARIB

What's this?

m-mohr commented 2 years ago

Caribbean Islands, in the folder CARIB (instead of GUAM)

gadomski commented 2 years ago

Thanks, missed that.

gadomski commented 2 years ago

Ok, I've looked into this is a bit more, and unsurprisingly the story is a little mixed. I checked on the latest data for all of the regions we're interested in for all the intervals and pass numbers, looks like only CARIB and GUAM are using negative values for NODATA. Those values change when you get to the 24 hour product (-3 for the 1 hour, -999 for the 24 hour).

I think the next steps are:

Full table below, and I've got the notebook in a branch here.

Name Interval Pass number Location Min Max Unique negative values Nodata values
0 MultiSensor_QPE_01H_Pass1 01H 1 CONUS 0 95.6 [] (None,)
1 MultiSensor_QPE_01H_Pass1 01H 1 ALASKA 0 7.8 [] (None,)
2 MultiSensor_QPE_01H_Pass1 01H 1 CARIB 0 18.7 ['-3.0'] (None,)
3 MultiSensor_QPE_01H_Pass1 01H 1 GUAM 0 4 ['-3.0'] (None,)
4 MultiSensor_QPE_01H_Pass1 01H 1 HAWAII 0 5 [] (None,)
5 MultiSensor_QPE_01H_Pass2 01H 2 CONUS 0 44.4 [] (None,)
6 MultiSensor_QPE_01H_Pass2 01H 2 ALASKA 0 7.8 [] (None,)
7 MultiSensor_QPE_01H_Pass2 01H 2 CARIB 0 18.7 ['-3.0'] (None,)
8 MultiSensor_QPE_01H_Pass2 01H 2 GUAM 0 6.6 ['-3.0'] (None,)
9 MultiSensor_QPE_01H_Pass2 01H 2 HAWAII 0 5.2 [] (None,)
10 MultiSensor_QPE_24H_Pass2 24H 2 CONUS 0 120 [] (None,)
11 MultiSensor_QPE_24H_Pass2 24H 2 ALASKA 0 125.7 [] (None,)
12 MultiSensor_QPE_24H_Pass2 24H 2 CARIB 0 92.5 ['-999.0'] (None,)
13 MultiSensor_QPE_24H_Pass2 24H 2 GUAM 0 22 ['-999.0'] (None,)
14 MultiSensor_QPE_24H_Pass2 24H 2 HAWAII 0 40.6 [] (None,)
pjhartzell commented 2 years ago

Per the MRMS table, all data products in the summary table above should be -1 = Missing and -3 = No Coverage. The -999 values do not agree with the reference. Perhaps Tom Augspurger can get some information on this from a NOAA contact? He has mentioned this a few times for VIIRS data.

MS's preference has been to create new COGs with the original nodata values preserved when we encounter datasets with multiple nodata values, so I'd lean in that direction. If we can't get an answer on the -999 value, I would preserve it in the nodata COG and use a class description to note the uncertainty on the meaning.

It's likely a black hole, but perhaps a ping to the email contact on this page?

My 2 cents. 🙂

m-mohr commented 2 years ago

Interesting, I had not seen the -999 values before. I guess we clarify this through the e-mail conversation you cc'ed me in? @gadomski

gadomski commented 2 years ago

Yes, NOAA is looking into the discrepancies from their side.

m-mohr commented 2 years ago

Decision for now:

Expose no-data values for GRIB through the classification extension. For COGs use the raster extension and normalize no-data values (i.e. all negative values) to -3 (or -1). Don't expose no-data values for the real multi-sensor data, only add no-data values for GUAM and CARIB to the metadata and document clearly that they are not really no-data values.

m-mohr commented 2 years ago

The last commits have added the following:

See the examples folder for details