Closed dylanbeaudette closed 11 months ago
Initial tests suggest that the two methods give ever so slightly different grid topology, but identical values.
> dim(mu.wcs)
[1] 260 382 1
> dim(mu)
[1] 259 382 1
> ext(mu.wcs)
SpatExtent : 473836.578491825, 485296.578491825, 1772737.35324757, 1780537.35324757 (xmin, xmax, ymin, ymax)
> ext(mu)
SpatExtent : 473835, 485295, 1772745, 1780515 (xmin, xmax, ymin, ymax)
@dylanbeaudette can you speak to what the internal extent of the whole grid stored in the WCS is? Basically I want to know if the entire WCS source is shifted, or if is this just a result of how cropping works? It seems odd to me that the WCS would return a result that doesn't conform to whatever the internal grid is... which suggests to me that the topology may not actually be identical to the original TIFF files.
I think this may be fixable on the R side if we decide on the authoritative grid topology for each whole source. Since we already store a variety of metadata about each layer, it should be trivial to add a few more elements. In addition to native CRS and resolution we need a number of rows/columns and extent--with that we can re-align any grid with terra::align()
and an assumption that the authoritative grid is the "correct" one.
As far as the dimension differences, I think that in general we can't expect crop()
and the WCS window to produce identical results. There is rounding/flooring that needs to happen when the input extent derived from a vector has variable dimensions that do not conform to whole grid cells. {terra} provides several different snapping algorithms in crop()
which affects this result regardless of rounding method used, the default is snap="near"
. I have not investigated whether similar options are available for the WCS.
If we force alignment of the grid, nodata would be properly filled in for the case of a smaller grid projected to the larger dimensions. So, while the individual result dimensions will sometimes differ from WCS v.s. {terra}, the data and grid topology can be made to be identical to the source and each other.
I imagine that conforming is less of an issue for ISSR800... but probably there should be an authoritative grid for everything, even if it is not an "official" product.
Will open a PR to demo for the CONUS mukey sources, then will need to get the same info for Hawaii, Puerto Rico, etc.
@dylanbeaudette Can you post the extent and number of rows/columns for rasters you uploaded to SoilWeb for Hawaii, Puerto Rico, and ISSR800?
EDIT: got ISSR800 from soil properties download page GeoTIFFs
x <- terra::rast("https://soilmap2-1.lawr.ucdavis.edu/800m_grids/rasters/caco3_kg_sq_m.tif")
x
#> class : SpatRaster
#> dimensions : 3621, 5770, 1 (nrow, ncol, nlyr)
#> resolution : 800, 800 (x, y)
#> extent : -2357200, 2258800, 276400, 3173200 (xmin, xmax, ymin, ymax)
#> coord. ref. : NAD83 / Conus Albers (EPSG:5070)
#> source : caco3_kg_sq_m.tif
#> name : caco3_kg_sq_m
As far as I can tell, the gNATSGO and gSSURGO mukey grids have not been modified relative to the source on Box.
Local files as downloaded from Box:
class : SpatRaster
dimensions : 96754, 153999, 1 (nrow, ncol, nlyr)
resolution : 30, 30 (x, y)
extent : -2356155, 2263815, 270015, 3172635 (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / Conus Albers (EPSG:5070)
source : gNATSGO-mukey.tif
name : gNATSGO-mukey
min value : 52288
max value : 3349994
class : SpatRaster
dimensions : 96754, 153999, 1 (nrow, ncol, nlyr)
resolution : 30, 30 (x, y)
extent : -2356155, 2263815, 270015, 3172635 (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / Conus Albers (EPSG:5070)
source : gSSURGO-mukey.tif
name : gSSURGO-mukey
min value : 52231
max value : 3350239
Files on SoilWeb, and output from gdalinfo.
gNATSGO:
Size is 153999, 96754
Coordinate System is:
PROJCS["NAD_1983_Contiguous_USA_Albers",
GEOGCS["NAD83",
DATUM["North_American_Datum_1983",
SPHEROID["GRS 1980",6378137,298.2572221010042,
AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6269"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4269"]],
PROJECTION["Albers_Conic_Equal_Area"],
PARAMETER["standard_parallel_1",29.5],
PARAMETER["standard_parallel_2",45.5],
PARAMETER["latitude_of_center",23],
PARAMETER["longitude_of_center",-96],
PARAMETER["false_easting",0],
PARAMETER["false_northing",0],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]],
AUTHORITY["EPSG","5070"]]
Origin = (-2356155.000000000000000,3172635.000000000000000)
Pixel Size = (30.000000000000000,-30.000000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
COMPRESSION=LZW
INTERLEAVE=BAND
Corner Coordinates:
Upper Left (-2356155.000, 3172635.000) (127d53'17.14"W, 47d57'30.54"N)
Lower Left (-2356155.000, 270015.000) (118d44'16.51"W, 22d52'42.36"N)
Upper Right ( 2263815.000, 3172635.000) ( 65d16'29.27"W, 48d13'46.10"N)
Lower Right ( 2263815.000, 270015.000) ( 74d 7'17.21"W, 23d 4'33.93"N)
Center ( -46170.000, 1721325.000) ( 96d32' 4.49"W, 38d31'14.76"N)
Band 1 Block=128x128 Type=UInt32, ColorInterp=Gray
NoData Value=2147483647
gSSURGO:
Size is 153999, 96754
Coordinate System is:
PROJCS["NAD_1983_Contiguous_USA_Albers",
GEOGCS["NAD83",
DATUM["North_American_Datum_1983",
SPHEROID["GRS 1980",6378137,298.2572221010042,
AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6269"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4269"]],
PROJECTION["Albers_Conic_Equal_Area"],
PARAMETER["standard_parallel_1",29.5],
PARAMETER["standard_parallel_2",45.5],
PARAMETER["latitude_of_center",23],
PARAMETER["longitude_of_center",-96],
PARAMETER["false_easting",0],
PARAMETER["false_northing",0],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]],
AUTHORITY["EPSG","5070"]]
Origin = (-2356155.000000000000000,3172635.000000000000000)
Pixel Size = (30.000000000000000,-30.000000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
COMPRESSION=LZW
INTERLEAVE=BAND
Corner Coordinates:
Upper Left (-2356155.000, 3172635.000) (127d53'17.14"W, 47d57'30.54"N)
Lower Left (-2356155.000, 270015.000) (118d44'16.51"W, 22d52'42.36"N)
Upper Right ( 2263815.000, 3172635.000) ( 65d16'29.27"W, 48d13'46.10"N)
Lower Right ( 2263815.000, 270015.000) ( 74d 7'17.21"W, 23d 4'33.93"N)
Center ( -46170.000, 1721325.000) ( 96d32' 4.49"W, 38d31'14.76"N)
Band 1 Block=128x128 Type=UInt32, ColorInterp=Gray
Min=52231.000 Max=3350239.000
Minimum=52231.000, Maximum=3350239.000, Mean=1019199.009, StdDev=999530.613
NoData Value=2147483647
Metadata:
RepresentationType=THEMATIC
STATISTICS_COVARIANCES=999061446534.1403
STATISTICS_MAXIMUM=3350239
STATISTICS_MEAN=1019199.0090442
STATISTICS_MINIMUM=52231
STATISTICS_SKIPFACTORX=1
STATISTICS_SKIPFACTORY=1
STATISTICS_STDDEV=999530.61310504
Note that the HI and PR grids use their own, local CRS.
HI:
Size is 17193, 12440
Coordinate System is:
PROJCS["NAD83(PA11) / Hawaii zone 1",
GEOGCS["NAD83(PA11)",
DATUM["NAD83_National_Spatial_Reference_System_PA11",
SPHEROID["GRS 1980",6378137,298.257222101,
AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","1117"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","6322"]],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",18.83333333333333],
PARAMETER["central_meridian",-155.5],
PARAMETER["scale_factor",0.999966667],
PARAMETER["false_easting",500000],
PARAMETER["false_northing",0],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]],
AXIS["X",EAST],
AXIS["Y",NORTH],
AUTHORITY["EPSG","6628"]]
Origin = (56991.644909009395633,381785.136670718959067)
Pixel Size = (29.999999999999996,-30.000000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
COMPRESSION=LZW
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( 56991.645, 381785.137) (159d47'39.12"W, 22d13'30.48"N)
Lower Left ( 56991.645, 8585.137) (159d42' 4.13"W, 18d51'48.22"N)
Upper Right ( 572781.645, 381785.137) (154d47'37.57"W, 22d16'49.36"N)
Lower Right ( 572781.645, 8585.137) (154d48'32.85"W, 18d54'34.60"N)
Center ( 314886.645, 195185.137) (157d16'31.63"W, 20d35'14.85"N)
Band 1 Block=256x256 Type=UInt32, ColorInterp=Gray
Description = mukey_int
Min=467795.000 Max=3172064.000
Minimum=467795.000, Maximum=3172064.000, Mean=-9999.000, StdDev=-9999.000
NoData Value=4294967295
Metadata:
STATISTICS_MAXIMUM=3172064
STATISTICS_MEAN=-9999
STATISTICS_MINIMUM=467795
STATISTICS_STDDEV=-9999
PR:
Size is 9608, 2229
Coordinate System is:
PROJCS["NAD83 / Puerto Rico & Virgin Is.",
GEOGCS["NAD83",
DATUM["North_American_Datum_1983",
SPHEROID["GRS 1980",6378137,298.257222101,
AUTHORITY["EPSG","7019"]],
TOWGS84[0,0,0,0,0,0,0],
AUTHORITY["EPSG","6269"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4269"]],
PROJECTION["Lambert_Conformal_Conic_2SP"],
PARAMETER["standard_parallel_1",18.43333333333333],
PARAMETER["standard_parallel_2",18.03333333333333],
PARAMETER["latitude_of_origin",17.83333333333333],
PARAMETER["central_meridian",-66.43333333333334],
PARAMETER["false_easting",200000],
PARAMETER["false_northing",200000],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]],
AXIS["X",EAST],
AXIS["Y",NORTH],
AUTHORITY["EPSG","32161"]]
Origin = (39904.225262861582451,275685.704392688930966)
Pixel Size = (30.000000000000000,-30.000000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
COMPRESSION=LZW
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( 39904.225, 275685.704) ( 67d56'58.07"W, 18d30'40.16"N)
Lower Left ( 39904.225, 208815.704) ( 67d56'39.26"W, 17d54'25.26"N)
Upper Right ( 328144.225, 275685.704) ( 65d13'11.21"W, 18d30'47.90"N)
Lower Right ( 328144.225, 208815.704) ( 65d13'26.26"W, 17d54'32.98"N)
Center ( 184024.225, 242250.704) ( 66d35' 3.73"W, 18d12'54.03"N)
Band 1 Block=256x256 Type=UInt32, ColorInterp=Gray
Description = mukey_int
Min=326380.000 Max=3349465.000
Minimum=326380.000, Maximum=3349465.000, Mean=-9999.000, StdDev=-9999.000
NoData Value=4294967295
Metadata:
STATISTICS_MAXIMUM=3349465
STATISTICS_MEAN=-9999
STATISTICS_MINIMUM=326380
STATISTICS_STDDEV=-9999
As far as I can tell, the gNATSGO and gSSURGO mukey grids have not been modified relative to the source on Box.
OK. Well, then it seems WCS result/cropping appears to not respect the source grid.
I think this approach to align to an authoritative reference makes sense, so we should implement it for the remaining sources outside CONUS.
I suspect that the WCS is performing some alignment / cropping outside of our control, possibly a precision issue. I'll look into it some more this week.
Related, but not yet a fix or alternative. As I continue to move content over to the new SoilWeb servers, I'll convert all TIFFs to COG and we can test parallel access: WCS and vsicurl. I don't know if this is efficient, but seems to work as expected:
u <- '/vsicurl/https://soilmap2-1.lawr.ucdavis.edu/wcs-files/gSSURGO-mukey.tif'
x <- rast(u)
bb <- 'POLYGON((-118.6609 36.4820,-118.6609 36.5972,-118.3979 36.5972,-118.3979 36.4820,-118.6609 36.4820))'
a <- vect(wkt, crs = 'epsg:4326')
a <- project(a, 'epsg:5070')
z <- crop(x, a)
z <- as.factor(z)
plot(z)
lines(a)
This appears to return data perfectly aligned with the source data, and closely follows the first example in the WCS tutorial.
There are many features of the WCS that I prefer, especially limits on maximum result sizes, arbitrary resolution, etc.. The COG approach is cleaner, and I don't know if it can scale as efficiently as the WCS.
Either way, having wrapper functions in soilDB seems like a good idea.
Related, but not yet a fix or alternative. As I continue to move content over to the new SoilWeb servers, I'll convert all TIFFs to COG and we can test parallel access: WCS and vsicurl. I don't know if this is efficient, but seems to work as expected:
Sure... IMO the cloud-optimized geotiff is far superior to all the stuff we are doing to make WCS work... but I guess I assumed the reason we didn't do COG from the beginning is because once it is out in the wild, possibly with a wrapper function, people are going to hammer it with requests. I have used COG for very large areas at fine resolution and I would guess it is at least as efficient as WCS in terms of time to download an area of interest. But I am not sure that is something you wanted the SoilWeb servers to be handling.
Agreed on all points.
OK, I can confirm that HI and PR (custom) SSURGO grids were accidentally created with non-integer extent. These were based on a BBOX around MLRA for those areas. I'll re-make those grids using integer extent, and push to WCS.
updated grid systems.
HI
Corner Coordinates:
Upper Left ( 56992.000, 381815.000) (159d47'39.14"W, 22d13'31.44"N)
Lower Left ( 56992.000, 8585.000) (159d42' 4.12"W, 18d51'48.22"N)
Upper Right ( 572782.000, 381815.000) (154d47'37.55"W, 22d16'50.33"N)
Lower Right ( 572782.000, 8585.000) (154d48'32.84"W, 18d54'34.59"N)
Center ( 314887.000, 195200.000) (157d16'31.63"W, 20d35'15.33"N)
PR
Corner Coordinates:
Upper Left ( 39905.000, 275685.000) ( 67d56'58.04"W, 18d30'40.14"N)
Lower Left ( 39905.000, 208815.000) ( 67d56'39.23"W, 17d54'25.23"N)
Upper Right ( 328145.000, 275685.000) ( 65d13'11.18"W, 18d30'47.88"N)
Lower Right ( 328145.000, 208815.000) ( 65d13'26.24"W, 17d54'32.95"N)
Center ( 184025.000, 242250.000) ( 66d35' 3.70"W, 18d12'54.01"N)
OK, new HI and PR grids created, and pushed to the WCS servers. Here are the source grid specs:
HI
class : SpatRaster
dimensions : 12441, 17193, 1 (nrow, ncol, nlyr)
resolution : 30, 30 (x, y)
extent : 56992, 572782, 8585, 381815 (xmin, xmax, ymin, ymax)
coord. ref. : NAD83(PA11) / Hawaii zone 1 (EPSG:6628)
source : hi-mukey-grid.tif
name : mukey_int
min value : 467795
max value : 3172064
PR
class : SpatRaster
dimensions : 2229, 9608, 1 (nrow, ncol, nlyr)
resolution : 30, 30 (x, y)
extent : 39905, 328145, 208815, 275685 (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / Puerto Rico & Virgin Is. (EPSG:32161)
source : pr-mukey-grid.tif
name : mukey_int
min value : 326380
max value : 3349465
Have you looked at this / thought about how to test?
I would like to make sure #306 doesn't introduce any unexpected artefacts--so might be worth compare current master against fix304 branch