ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
GNU General Public License v3.0
81 stars 20 forks source link

test mukey.wcs against local calls to crop() on mukey grid #307

Closed dylanbeaudette closed 11 months ago

brownag commented 1 year ago

Have you looked at this / thought about how to test?

I would like to make sure #306 doesn't introduce any unexpected artefacts--so might be worth compare current master against fix304 branch

dylanbeaudette commented 1 year ago

Initial tests suggest that the two methods give ever so slightly different grid topology, but identical values.

> dim(mu.wcs)
[1] 260 382   1
> dim(mu)
[1] 259 382   1

> ext(mu.wcs)
SpatExtent : 473836.578491825, 485296.578491825, 1772737.35324757, 1780537.35324757 (xmin, xmax, ymin, ymax)
> ext(mu)
SpatExtent : 473835, 485295, 1772745, 1780515 (xmin, xmax, ymin, ymax)
brownag commented 11 months ago

@dylanbeaudette can you speak to what the internal extent of the whole grid stored in the WCS is? Basically I want to know if the entire WCS source is shifted, or if is this just a result of how cropping works? It seems odd to me that the WCS would return a result that doesn't conform to whatever the internal grid is... which suggests to me that the topology may not actually be identical to the original TIFF files.

I think this may be fixable on the R side if we decide on the authoritative grid topology for each whole source. Since we already store a variety of metadata about each layer, it should be trivial to add a few more elements. In addition to native CRS and resolution we need a number of rows/columns and extent--with that we can re-align any grid with terra::align() and an assumption that the authoritative grid is the "correct" one.

As far as the dimension differences, I think that in general we can't expect crop() and the WCS window to produce identical results. There is rounding/flooring that needs to happen when the input extent derived from a vector has variable dimensions that do not conform to whole grid cells. {terra} provides several different snapping algorithms in crop() which affects this result regardless of rounding method used, the default is snap="near". I have not investigated whether similar options are available for the WCS.

If we force alignment of the grid, nodata would be properly filled in for the case of a smaller grid projected to the larger dimensions. So, while the individual result dimensions will sometimes differ from WCS v.s. {terra}, the data and grid topology can be made to be identical to the source and each other.

I imagine that conforming is less of an issue for ISSR800... but probably there should be an authoritative grid for everything, even if it is not an "official" product.

Will open a PR to demo for the CONUS mukey sources, then will need to get the same info for Hawaii, Puerto Rico, etc.

brownag commented 11 months ago

@dylanbeaudette Can you post the extent and number of rows/columns for rasters you uploaded to SoilWeb for Hawaii, Puerto Rico, and ISSR800?

EDIT: got ISSR800 from soil properties download page GeoTIFFs

x <- terra::rast("https://soilmap2-1.lawr.ucdavis.edu/800m_grids/rasters/caco3_kg_sq_m.tif")
x
#> class       : SpatRaster 
#> dimensions  : 3621, 5770, 1  (nrow, ncol, nlyr)
#> resolution  : 800, 800  (x, y)
#> extent      : -2357200, 2258800, 276400, 3173200  (xmin, xmax, ymin, ymax)
#> coord. ref. : NAD83 / Conus Albers (EPSG:5070) 
#> source      : caco3_kg_sq_m.tif 
#> name        : caco3_kg_sq_m
dylanbeaudette commented 11 months ago

As far as I can tell, the gNATSGO and gSSURGO mukey grids have not been modified relative to the source on Box.

Local files as downloaded from Box:

class       : SpatRaster 
dimensions  : 96754, 153999, 1  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : -2356155, 2263815, 270015, 3172635  (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / Conus Albers (EPSG:5070) 
source      : gNATSGO-mukey.tif 
name        : gNATSGO-mukey 
min value   :         52288 
max value   :       3349994 

class       : SpatRaster 
dimensions  : 96754, 153999, 1  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : -2356155, 2263815, 270015, 3172635  (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / Conus Albers (EPSG:5070) 
source      : gSSURGO-mukey.tif 
name        : gSSURGO-mukey 
min value   :         52231 
max value   :       3350239

Files on SoilWeb, and output from gdalinfo.

gNATSGO:

Size is 153999, 96754
Coordinate System is:
PROJCS["NAD_1983_Contiguous_USA_Albers",
    GEOGCS["NAD83",
        DATUM["North_American_Datum_1983",
            SPHEROID["GRS 1980",6378137,298.2572221010042,
                AUTHORITY["EPSG","7019"]],
            AUTHORITY["EPSG","6269"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433],
        AUTHORITY["EPSG","4269"]],
    PROJECTION["Albers_Conic_Equal_Area"],
    PARAMETER["standard_parallel_1",29.5],
    PARAMETER["standard_parallel_2",45.5],
    PARAMETER["latitude_of_center",23],
    PARAMETER["longitude_of_center",-96],
    PARAMETER["false_easting",0],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AUTHORITY["EPSG","5070"]]
Origin = (-2356155.000000000000000,3172635.000000000000000)
Pixel Size = (30.000000000000000,-30.000000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (-2356155.000, 3172635.000) (127d53'17.14"W, 47d57'30.54"N)
Lower Left  (-2356155.000,  270015.000) (118d44'16.51"W, 22d52'42.36"N)
Upper Right ( 2263815.000, 3172635.000) ( 65d16'29.27"W, 48d13'46.10"N)
Lower Right ( 2263815.000,  270015.000) ( 74d 7'17.21"W, 23d 4'33.93"N)
Center      (  -46170.000, 1721325.000) ( 96d32' 4.49"W, 38d31'14.76"N)
Band 1 Block=128x128 Type=UInt32, ColorInterp=Gray
  NoData Value=2147483647

gSSURGO:

Size is 153999, 96754
Coordinate System is:
PROJCS["NAD_1983_Contiguous_USA_Albers",
    GEOGCS["NAD83",
        DATUM["North_American_Datum_1983",
            SPHEROID["GRS 1980",6378137,298.2572221010042,
                AUTHORITY["EPSG","7019"]],
            AUTHORITY["EPSG","6269"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433],
        AUTHORITY["EPSG","4269"]],
    PROJECTION["Albers_Conic_Equal_Area"],
    PARAMETER["standard_parallel_1",29.5],
    PARAMETER["standard_parallel_2",45.5],
    PARAMETER["latitude_of_center",23],
    PARAMETER["longitude_of_center",-96],
    PARAMETER["false_easting",0],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AUTHORITY["EPSG","5070"]]
Origin = (-2356155.000000000000000,3172635.000000000000000)
Pixel Size = (30.000000000000000,-30.000000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (-2356155.000, 3172635.000) (127d53'17.14"W, 47d57'30.54"N)
Lower Left  (-2356155.000,  270015.000) (118d44'16.51"W, 22d52'42.36"N)
Upper Right ( 2263815.000, 3172635.000) ( 65d16'29.27"W, 48d13'46.10"N)
Lower Right ( 2263815.000,  270015.000) ( 74d 7'17.21"W, 23d 4'33.93"N)
Center      (  -46170.000, 1721325.000) ( 96d32' 4.49"W, 38d31'14.76"N)
Band 1 Block=128x128 Type=UInt32, ColorInterp=Gray
  Min=52231.000 Max=3350239.000
  Minimum=52231.000, Maximum=3350239.000, Mean=1019199.009, StdDev=999530.613
  NoData Value=2147483647
  Metadata:
    RepresentationType=THEMATIC
    STATISTICS_COVARIANCES=999061446534.1403
    STATISTICS_MAXIMUM=3350239
    STATISTICS_MEAN=1019199.0090442
    STATISTICS_MINIMUM=52231
    STATISTICS_SKIPFACTORX=1
    STATISTICS_SKIPFACTORY=1
    STATISTICS_STDDEV=999530.61310504
dylanbeaudette commented 11 months ago

Note that the HI and PR grids use their own, local CRS.

HI:

Size is 17193, 12440
Coordinate System is:
PROJCS["NAD83(PA11) / Hawaii zone 1",
    GEOGCS["NAD83(PA11)",
        DATUM["NAD83_National_Spatial_Reference_System_PA11",
            SPHEROID["GRS 1980",6378137,298.257222101,
                AUTHORITY["EPSG","7019"]],
            AUTHORITY["EPSG","1117"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","6322"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",18.83333333333333],
    PARAMETER["central_meridian",-155.5],
    PARAMETER["scale_factor",0.999966667],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AXIS["X",EAST],
    AXIS["Y",NORTH],
    AUTHORITY["EPSG","6628"]]
Origin = (56991.644909009395633,381785.136670718959067)
Pixel Size = (29.999999999999996,-30.000000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (   56991.645,  381785.137) (159d47'39.12"W, 22d13'30.48"N)
Lower Left  (   56991.645,    8585.137) (159d42' 4.13"W, 18d51'48.22"N)
Upper Right (  572781.645,  381785.137) (154d47'37.57"W, 22d16'49.36"N)
Lower Right (  572781.645,    8585.137) (154d48'32.85"W, 18d54'34.60"N)
Center      (  314886.645,  195185.137) (157d16'31.63"W, 20d35'14.85"N)
Band 1 Block=256x256 Type=UInt32, ColorInterp=Gray
  Description = mukey_int
  Min=467795.000 Max=3172064.000
  Minimum=467795.000, Maximum=3172064.000, Mean=-9999.000, StdDev=-9999.000
  NoData Value=4294967295
  Metadata:
    STATISTICS_MAXIMUM=3172064
    STATISTICS_MEAN=-9999
    STATISTICS_MINIMUM=467795
    STATISTICS_STDDEV=-9999

PR:

Size is 9608, 2229
Coordinate System is:
PROJCS["NAD83 / Puerto Rico & Virgin Is.",
    GEOGCS["NAD83",
        DATUM["North_American_Datum_1983",
            SPHEROID["GRS 1980",6378137,298.257222101,
                AUTHORITY["EPSG","7019"]],
            TOWGS84[0,0,0,0,0,0,0],
            AUTHORITY["EPSG","6269"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4269"]],
    PROJECTION["Lambert_Conformal_Conic_2SP"],
    PARAMETER["standard_parallel_1",18.43333333333333],
    PARAMETER["standard_parallel_2",18.03333333333333],
    PARAMETER["latitude_of_origin",17.83333333333333],
    PARAMETER["central_meridian",-66.43333333333334],
    PARAMETER["false_easting",200000],
    PARAMETER["false_northing",200000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AXIS["X",EAST],
    AXIS["Y",NORTH],
    AUTHORITY["EPSG","32161"]]
Origin = (39904.225262861582451,275685.704392688930966)
Pixel Size = (30.000000000000000,-30.000000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (   39904.225,  275685.704) ( 67d56'58.07"W, 18d30'40.16"N)
Lower Left  (   39904.225,  208815.704) ( 67d56'39.26"W, 17d54'25.26"N)
Upper Right (  328144.225,  275685.704) ( 65d13'11.21"W, 18d30'47.90"N)
Lower Right (  328144.225,  208815.704) ( 65d13'26.26"W, 17d54'32.98"N)
Center      (  184024.225,  242250.704) ( 66d35' 3.73"W, 18d12'54.03"N)
Band 1 Block=256x256 Type=UInt32, ColorInterp=Gray
  Description = mukey_int
  Min=326380.000 Max=3349465.000
  Minimum=326380.000, Maximum=3349465.000, Mean=-9999.000, StdDev=-9999.000
  NoData Value=4294967295
  Metadata:
    STATISTICS_MAXIMUM=3349465
    STATISTICS_MEAN=-9999
    STATISTICS_MINIMUM=326380
    STATISTICS_STDDEV=-9999
brownag commented 11 months ago

As far as I can tell, the gNATSGO and gSSURGO mukey grids have not been modified relative to the source on Box.

OK. Well, then it seems WCS result/cropping appears to not respect the source grid.

I think this approach to align to an authoritative reference makes sense, so we should implement it for the remaining sources outside CONUS.

dylanbeaudette commented 11 months ago

I suspect that the WCS is performing some alignment / cropping outside of our control, possibly a precision issue. I'll look into it some more this week.

Related, but not yet a fix or alternative. As I continue to move content over to the new SoilWeb servers, I'll convert all TIFFs to COG and we can test parallel access: WCS and vsicurl. I don't know if this is efficient, but seems to work as expected:

u <- '/vsicurl/https://soilmap2-1.lawr.ucdavis.edu/wcs-files/gSSURGO-mukey.tif'
x <- rast(u)

bb <- 'POLYGON((-118.6609 36.4820,-118.6609 36.5972,-118.3979 36.5972,-118.3979 36.4820,-118.6609 36.4820))'

a <- vect(wkt, crs = 'epsg:4326')
a <- project(a, 'epsg:5070')

z <- crop(x, a)
z <- as.factor(z)

plot(z)
lines(a)

This appears to return data perfectly aligned with the source data, and closely follows the first example in the WCS tutorial.

There are many features of the WCS that I prefer, especially limits on maximum result sizes, arbitrary resolution, etc.. The COG approach is cleaner, and I don't know if it can scale as efficiently as the WCS.

Either way, having wrapper functions in soilDB seems like a good idea.

brownag commented 11 months ago

Related, but not yet a fix or alternative. As I continue to move content over to the new SoilWeb servers, I'll convert all TIFFs to COG and we can test parallel access: WCS and vsicurl. I don't know if this is efficient, but seems to work as expected:

Sure... IMO the cloud-optimized geotiff is far superior to all the stuff we are doing to make WCS work... but I guess I assumed the reason we didn't do COG from the beginning is because once it is out in the wild, possibly with a wrapper function, people are going to hammer it with requests. I have used COG for very large areas at fine resolution and I would guess it is at least as efficient as WCS in terms of time to download an area of interest. But I am not sure that is something you wanted the SoilWeb servers to be handling.

dylanbeaudette commented 11 months ago

Agreed on all points.

dylanbeaudette commented 11 months ago

OK, I can confirm that HI and PR (custom) SSURGO grids were accidentally created with non-integer extent. These were based on a BBOX around MLRA for those areas. I'll re-make those grids using integer extent, and push to WCS.

updated grid systems.

HI

Corner Coordinates:
Upper Left  (   56992.000,  381815.000) (159d47'39.14"W, 22d13'31.44"N)
Lower Left  (   56992.000,    8585.000) (159d42' 4.12"W, 18d51'48.22"N)
Upper Right (  572782.000,  381815.000) (154d47'37.55"W, 22d16'50.33"N)
Lower Right (  572782.000,    8585.000) (154d48'32.84"W, 18d54'34.59"N)
Center      (  314887.000,  195200.000) (157d16'31.63"W, 20d35'15.33"N)

PR

Corner Coordinates:
Upper Left  (   39905.000,  275685.000) ( 67d56'58.04"W, 18d30'40.14"N)
Lower Left  (   39905.000,  208815.000) ( 67d56'39.23"W, 17d54'25.23"N)
Upper Right (  328145.000,  275685.000) ( 65d13'11.18"W, 18d30'47.88"N)
Lower Right (  328145.000,  208815.000) ( 65d13'26.24"W, 17d54'32.95"N)
Center      (  184025.000,  242250.000) ( 66d35' 3.70"W, 18d12'54.01"N)
dylanbeaudette commented 11 months ago

OK, new HI and PR grids created, and pushed to the WCS servers. Here are the source grid specs:

HI

class       : SpatRaster 
dimensions  : 12441, 17193, 1  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : 56992, 572782, 8585, 381815  (xmin, xmax, ymin, ymax)
coord. ref. : NAD83(PA11) / Hawaii zone 1 (EPSG:6628) 
source      : hi-mukey-grid.tif 
name        : mukey_int 
min value   :    467795 
max value   :   3172064 

PR

class       : SpatRaster 
dimensions  : 2229, 9608, 1  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : 39905, 328145, 208815, 275685  (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / Puerto Rico & Virgin Is. (EPSG:32161) 
source      : pr-mukey-grid.tif 
name        : mukey_int 
min value   :    326380 
max value   :   3349465