microsoft / PlanetaryComputer

Issues, discussions, and information about the Microsoft Planetary Computer
https://planetarycomputer.microsoft.com/
MIT License
185 stars 8 forks source link

Potential issues in STAC items for USGS 3DEP-seamless dataset #126

Open vlandau opened 2 years ago

vlandau commented 2 years ago

I'm noticing inconsistencies with the USGS 3DEP elevation data when trying to load tiles into DataArrays via stackstac. I posted an issue in the stackstac repo but I'm starting to think the issue might be in the item entries in the STAC itself.

I'm encountering an issue where stackstac throws an error (which appears to be because it can't find the resolution in the item). Specifying a resolution the the stackstac.stack call manually fixes that issue, but then I end up empty (0-length) indices for band and time, and a few other attributes are missing as well.

Any idea what might be going on? MWEs are below.

To create the wonky DataArray:

import planetary_computer as pc
import stackstac
from pystac_client import Client as stac_client

bbox = (-104.79075, 30.86168, -104.77074999999999, 30.881680000000003)

# Open the catalog with pystac_client  
catalog = stac_client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

# Search the catalog
search = catalog.search(collections=["3dep-seamless"], bbox=bbox)
items = list(search.items())

# We only want the high res layers
items_high_res = [
    pc.sign(item).to_dict()
    for item in items
    if item.properties["gsd"] == 10
]

# Lazily initialize our data layers
elevation = stackstac.stack(
    items_high_res,
    bounds=bbox,
    chunksize=64,
    resolution=(9.2592593e-05, 9.2592593e-05) # This is the native resolution, hard coded 
                                # as a workaround for a bug
).squeeze()

elevation

Output:

>>> elevation
<xarray.DataArray 'stackstac-2f63e84b98241f78db41c110675eccdc' (time: 0,
                                                                band: 0,
                                                                y: 217, x: 217)>
dask.array<fetch_raster_window, shape=(0, 0, 217, 217), dtype=float64, chunksize=(0, 0, 64, 64), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 
    id       (time) float64 
  * band     (band) float64 
  * x        (x) float64 -104.8 -104.8 -104.8 -104.8 ... -104.8 -104.8 -104.8
  * y        (y) float64 30.88 30.88 30.88 30.88 ... 30.86 30.86 30.86 30.86
    epsg     int64 5498
Attributes:
    spec:        RasterSpec(epsg=5498, bounds=(-104.790833794413, 30.86166680...
    crs:         epsg:5498
    transform:   | 0.00, 0.00,-104.79|\n| 0.00,-0.00, 30.88|\n| 0.00, 0.00, 1...
    resolution:  9.2592593e-05

To create a normal DataArray with a different bbox

import planetary_computer as pc
import stackstac
from pystac_client import Client as stac_client

## This bbox line is the only difference from the above code block
bbox = (-97.73474, 33.785180000000004, -97.71473999999999, 33.80518)

# open the catalog with pystac_client  
catalog = stac_client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

# Search the catalog
search = catalog.search(collections=["3dep-seamless"], bbox=bbox)
items = list(search.items())

# We only want the high res layers
items_high_res = [
    pc.sign(item).to_dict()
    for item in items
    if item.properties["gsd"] == 10
]

# Lazily initialize our data layers
elevation = stackstac.stack(
    items_high_res,
    bounds=bbox,
    chunksize=64,
    resolution=(9.2592593e-05, 9.2592593e-05) # This is the native resolution, hard coded 
                                # as a workaround for a bug
).squeeze()

elevation

output:

>>> elevation
<xarray.DataArray 'stackstac-bb3aa93860305fb871c9f6e04bdf9be2' (y: 217, x: 217)>
dask.array<getitem, shape=(217, 217), dtype=float64, chunksize=(64, 64), chunktype=numpy.ndarray>
Coordinates: (12/15)
    time             datetime64[ns] 2019-11-25
    id               <U10 'n34w098-13'
    band             <U4 'data'
  * x                (x) float64 -97.73 -97.73 -97.73 ... -97.72 -97.71 -97.71
  * y                (y) float64 33.81 33.81 33.81 33.8 ... 33.79 33.79 33.79
    proj:shape       object {10812}
    ...               ...
    gsd              int64 10
    start_datetime   <U20 '2018-03-09T00:00:00Z'
    threedep:region  <U7 'n30w100'
    title            <U40 'USGS 1/3 arc-second n34w098 1 x 1 degree'
    description      <U1849 'This tile of the 3D Elevation Program (3DEP) sea...
    epsg             int64 5498
Attributes:
    spec:        RasterSpec(epsg=5498, bounds=(-97.734815244848, 33.785092741...
    crs:         epsg:5498
    transform:   | 0.00, 0.00,-97.73|\n| 0.00,-0.00, 33.81|\n| 0.00, 0.00, 1.00|
    resolution:  9.2592593e-05
TomAugspurger commented 2 years ago

Thanks for the report. I'll take a closer look when I get a chance, but a couple quick notes:

  1. I'm not super familiar with which properties of the STAC item stackstac uses for resolution, but presumably it's using proj:transform?
  2. Maybe it makes sense to get the links for a "good" and "bad" STAC item and check the
    • STAC metadata for each
    • The values in the COGs for each (using e.g. gdalinfo)
vlandau commented 2 years ago

Thanks @TomAugspurger. I'll get the links for each STAC item and will take a look. I'll also post those links here in case you get a change to look at it as well. Will report back here shortly with those links.

vlandau commented 2 years ago

Here's are the urls for the "bad" stac item:

Here's are the urls for the "good" item:

Currently downloading these to inspect with GDAL but it's taking a while.

vlandau commented 2 years ago

Good item:

⟩ gdalinfo data/USGS_13_n34w098.tiff
Driver: GTiff/GeoTIFF
Files: data/USGS_13_n34w098.tiff
Size is 10812, 10812
Coordinate System is:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101004,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]
Data axis to CRS axis mapping: 2,1
Origin = (-98.000555556293307,34.000555555795103)
Pixel Size = (0.000092592592692,-0.000092592592692)
Metadata:
  AREA_OR_POINT=Area
  BandDefinitionKeyword=*
  DataType=*
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
  PREDICTOR=3
Corner Coordinates:
Upper Left  ( -98.0005556,  34.0005556) ( 98d 0' 2.00"W, 34d 0' 2.00"N)
Lower Left  ( -98.0005556,  32.9994444) ( 98d 0' 2.00"W, 32d59'58.00"N)
Upper Right ( -96.9994444,  34.0005556) ( 96d59'58.00"W, 34d 0' 2.00"N)
Lower Right ( -96.9994444,  32.9994444) ( 96d59'58.00"W, 32d59'58.00"N)
Center      ( -97.5000000,  33.5000000) ( 97d30' 0.00"W, 33d30' 0.00"N)
Band 1 Block=256x256 Type=Float32, ColorInterp=Gray
  Description = Layer_1
  Min=142.573 Max=403.063 
  Minimum=142.573, Maximum=403.063, Mean=262.266, StdDev=44.176
  NoData Value=-9999
  Overviews: 5406x5406, 2703x2703, 1352x1352, 676x676, 338x338
  Metadata:
    LAYER_TYPE=athematic
    RepresentationType=*
    STATISTICS_MAXIMUM=403.06253051758
    STATISTICS_MEAN=262.2663938277
    STATISTICS_MINIMUM=142.57325744629
    STATISTICS_STDDEV=44.175693340491
    STATISTICS_VALID_PERCENT=100

"Bad" item:

⟩ gdalinfo data/USGS_13_n31w103.tiff                                                                                                (soil_moisture) 
Driver: GTiff/GeoTIFF
Files: data/USGS_13_n31w103.tiff
Size is 10812, 10812
Coordinate System is:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101004,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]
Data axis to CRS axis mapping: 2,1
Origin = (-103.000555555555565,31.000555555555557)
Pixel Size = (0.000092592592593,-0.000092592592593)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
  PREDICTOR=3
Corner Coordinates:
Upper Left  (-103.0005556,  31.0005556) (103d 0' 2.00"W, 31d 0' 2.00"N)
Lower Left  (-103.0005556,  29.9994444) (103d 0' 2.00"W, 29d59'58.00"N)
Upper Right (-101.9994444,  31.0005556) (101d59'58.00"W, 31d 0' 2.00"N)
Lower Right (-101.9994444,  29.9994444) (101d59'58.00"W, 29d59'58.00"N)
Center      (-102.5000000,  30.5000000) (102d30' 0.00"W, 30d30' 0.00"N)
Band 1 Block=256x256 Type=Float32, ColorInterp=Gray
  Description = Layer_1
  Min=560.307 Max=1660.885 
  Minimum=560.307, Maximum=1660.885, Mean=962.446, StdDev=176.971
  NoData Value=-999999
  Overviews: 5406x5406, 2703x2703, 1352x1352, 676x676, 338x338
  Metadata:
    LAYER_TYPE=athematic
    STATISTICS_MAXIMUM=1660.8853759766
    STATISTICS_MEAN=962.4457942364
    STATISTICS_MINIMUM=560.30670166016
    STATISTICS_STDDEV=176.97062422337
    STATISTICS_VALID_PERCENT=100

Nothing seems funny here. I'll look into the actual stac metadata now -- I didn't initially see any differences after comparing the outputs of to_dict(), but I'll take a closer look.

vlandau commented 2 years ago

Ah, looks like the transform entry in the bad item is just wrong. The math also doesn't make sense for the bounding box, transform, and dimensions (ROWxCOL).

transform for "bad" item (item.properties["transform"]):

'proj:transform': [1e-05,
     0.0,
     -103.0005555556,
     0.0,
     -1e-05,
     31.00055555556,
     0.0,
     0.0,
     1.0]

transform, and dimensions (ROWxCOL)

transform for "good" item ( (item.properties["transform"]):

'proj:transform': [9.2592593e-05,
     0.0,
     -98.0005555563,
     0.0,
     -9.2592593e-05,
     34.0005555558,
     0.0,
     0.0,
     1.0]

You'll notice that these don't line up with the pixel sizes that were output from gdalinfo above, so it looks like the data are correct, but the stac metadata is not.

There are several items where this problem is arising. Based on an admittedly superficial look, all of the problem items also seem to be dated 2013.

TomAugspurger commented 2 years ago

Thanks for tracking that down. We'll look into it.

TomAugspurger commented 1 year ago

Sorry for failing to update the status here. In https://github.com/stactools-packages/threedep/pull/12, we fixed the upstream stactools package used to generate these items.

We'll need to pull that into our pipeline and regenerate all these items.