mikejohnson51 / opendap.catalog

Flexible backend for getting data from Web and local NetCDF resources into R
https://mikejohnson51.github.io/opendap.catalog
Other
7 stars 2 forks source link

Quality Checks on Catalog + new resources #18

Open mikejohnson51 opened 2 years ago

mikejohnson51 commented 2 years ago

Issues found by @rmcd-mscb. Thanks!

Needed Corrections

Datasets identified in ClimateR and geoknife here are the missing data in the opendap catalog

New Datasets of value:

http://thredds.northwestknowledge.net:8080/thredds/catalog/NWCSC_INTEGRATED_SCENARIOS_ALL_CLIMATE/catalog.html

Particularly the cfsv2 forecasts and the bssd-nnme daily and monthly.

I will edit this entry with progress.

rmcd-mscb commented 2 years ago

Mike - looking at maca_day data. I couple of cases I've looked at: agg_macav2metdata_huss_CNRM-CM5_r1i1p1_historical_1950_2005_CONUS_daily agg_macav2metdata_huss_GFDL-ESM2G_r1i1p1_historical_1950_2005_CONUS_daily

point to grid_id 175, which states that lons are in interval (looking at X1,Xn) -180 - 0, but the lons are actually in interval 0-360. The global attributes of the opendap data are wrong BTW (sent an email to Katherine Hegewisch). Also curious these are not using the standard gridmet grid...

Also with these 2 datasets I think the toptobottom code maybe wrong, if I understand it correctly. First value of lon is 25.06 and the last is 49.39 So I would guess the toptobottom code would be False in this case?

Thanks Mike - If I misunderstood something here I apologize ahead of time :)

mikejohnson51 commented 2 years ago

Hey @rmcd-mscb, thanks!

So I took a look and believe you raise two issues:

I believe all are giving the expected behavior for what I "want" but it might not be whats best for you/the community (issues one). Here are some details:

Validate Resource is "correct"

library(opendap.catalog)
library(terra)
library(dplyr)

ex = params[grepl('huss_CNRM-CM5_r1i1p1_historical', params$URL),]

data = dap(URL = ex$URL,
    AOI = AOI::aoi_get(state= "FL"),
    startDate = "2000-10-29")
#> source:   http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg... 
#> varname(s):
#>    > specific_humidity [kg kg-1] (Daily Mean Near-Surface Specific Humidity)
#> ==================================================
#> diminsions:  185, 144, 1 (names: lon,lat,time)
#> resolution:  0.042, 0.042, 1 days
#> extent:      -87.67, -79.96, 25.04, 31.04 (xmin, xmax, ymin, ymax)
#> crs:         +proj=longlat +a=6378137 +f=0.00335281066474748 +p...
#> time:        2000-10-29 to 2000-10-29
#> ==================================================
#> values: 26,640 (vars*X*Y*T)

plot(data$specific_humidity)

Created on 2022-06-13 by the reprex package (v2.0.1)

Things look good!

Build Mini-catalog

The catalog is built by "intelligently" (if I flatter myself) running this code over grouped resources (saves the need to pull data more then needed). We can test in on just this file:

glimpse(read_dap_file(ex$URL, "test"))
#> Rows: 1
#> Columns: 21
#> $ id          <chr> "test"
#> $ varname     <chr> "specific_humidity"
#> $ X_name      <chr> "lon"
#> $ Y_name      <chr> "lat"
#> $ T_name      <chr> "time"
#> $ units       <chr> "kg kg-1"
#> $ long_name   <chr> "Daily Mean Near-Surface Specific Humidity"
#> $ URL         <chr> "http://thredds.northwestknowledge.net:8080/thredds/dodsC/…
#> $ duration    <chr> "1950-01-01/2005-12-31"
#> $ interval    <chr> "1 days"
#> $ nT          <int> 20454
#> $ proj        <chr> "+proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no…
#> $ X1          <dbl> -124.7722
#> $ Xn          <dbl> -67.06476
#> $ Y1          <dbl> 25.06308
#> $ Yn          <dbl> 49.39602
#> $ resX        <dbl> 0.04166599
#> $ resY        <dbl> 0.041666
#> $ ncols       <dbl> 1386
#> $ nrows       <dbl> 585
#> $ toptobottom <lgl> TRUE

Created on 2022-06-13 by the reprex package (v2.0.1)

We have the same information as reported in the catalog, and that produce "correct" data.

Reason

Deep in the reasoning, I have defined that all degree based coordinates are set to -180 / 180 coordinates. The notion of "degree" is implied from the CRS.

https://github.com/mikejohnson51/opendap.catalog/blob/263b6bfdffd1ea7363ece2bbc88fcb939051249e/R/utils.R#L152

This makes it easier to identify indices without needing to worry about ever extracting coordinates.

Toptobottom

Here is a brief "highlight" section on what top to bottom means in the context of the catalog with some examples. This is enough content that it is worth a package vignette but I am a little short on time pre AGU next week :)

toptobottom is set based on this condition:

yy =  var.get.nc(nc, Y_name)

yy[1]
yy[length(yy)]
if (yy[1] > yy[length(yy)]) {
  toptobottom <- FALSE
} else {
  toptobottom <- TRUE
}

Where Y_name is the Y-coordinate dimension. When toptobottom = TRUE the array read from the opendap (or NetCDF file!) is flipped using terra::flip. terra::flip defaults to flipping vertically (rows). So if toptobottom = TRUE (yy[1] <= yy[length(yy)]) the incoming data is flipped along the rows.

What does this mean?

#vector of values
vals = c(1:4)
#> [1] 1 2 3 4

#structure as matrix
dim(vals) = c(2,2)
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4

#convert to spatial object
vals = rast(vals)
#> class       : SpatRaster 
#> dimensions  : 2, 2, 1  (nrow, ncol, nlyr)
#> resolution  : 1, 1  (x, y)
#> extent      : 0, 2, 0, 2  (xmin, xmax, ymin, ymax)
#> coord. ref. :  
#> source      : memory 
#> name        : lyr.1 
#> min value   :     1 
#> max value   :     4

#plot raw and flipped varients
rast(list(Raw = vals, Flipped = flip(vals))) |> plot()

Created on 2022-06-13 by the reprex package (v2.0.1)

3 examples from our resources:

Here three example datasets were chosen to highlight that the logic works on Northern Hemisphere, Southern Hemisphere and Global grids (this was as much of a question for myself!).

MACA (Northern Hemisphere)

ex = params[grepl("maca", params$URL),][1,]
g =  filter(grids, grid_id == ex$grid_id)

data1 = dap(URL = ex$URL, startDate = "2000-01-01")
#> source:   http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg... 
#> varname(s):
#>    > specific_humidity [kg kg-1] (Daily Mean Near-Surface Specific Humidity)
#> ==================================================
#> diminsions:  1386, 585, 1 (names: lon,lat,time)
#> resolution:  0.042, 0.042, 1 days
#> extent:      -124.79, -67.04, 25.04, 49.42 (xmin, xmax, ymin, ymax)
#> crs:         +proj=longlat +a=6378137 +f=0.00335281066474748 +p...
#> time:        2000-01-01 to 2000-01-01
#> ==================================================
#> values: 810,810 (vars*X*Y*T)

par(mfrow = c(2,2))
plot(data1$specific_humidity, main = "MACA via Catalog")
plot(aoi_get(state= "conus")$geometry, add = TRUE)
plot(var.get.nc(open.nc(ex$URL), g$Y_name), main = g$toptobottom)
plot(flip(data1$specific_humidity), main = "MACA Unflipped")

Created on 2022-06-14 by the reprex package (v2.0.1)

Terraclim (Global)

ex = params[grepl("terraclim", params$URL),][1,]

data1 = dap(URL = ex$URL,
           AOI = AOI::aoi_get(country= "Chile"),
           startDate = "2000-10-29")
#> source:   http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg... 
#> varname(s):
#>    > aet [mm] (water_evaporation_amount)
#> ==================================================
#> diminsions:  209, 914, 1 (names: lon,lat,time)
#> resolution:  0.042, 0.042, 1 months
#> extent:      -75.67, -66.96, -55.62, -17.54 (xmin, xmax, ymin, ymax)
#> crs:         +proj=longlat +a=6378137 +f=0.00335281066474748 +p...
#> time:        2000-11-01 to 2000-11-01
#> ==================================================
#> values: 191,026 (vars*X*Y*T)

data2 = dap(URL = ex$URL,
           AOI = AOI::aoi_get(country= "Kenya"),
           startDate = "2000-10-29")
#> source:   http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg... 
#> varname(s):
#>    > aet [mm] (water_evaporation_amount)
#> ==================================================
#> diminsions:  192, 246, 1 (names: lon,lat,time)
#> resolution:  0.042, 0.042, 1 months
#> extent:      33.88, 41.87, -4.71, 5.54 (xmin, xmax, ymin, ymax)
#> crs:         +proj=longlat +a=6378137 +f=0.00335281066474748 +p...
#> time:        2000-11-01 to 2000-11-01
#> ==================================================
#> values: 47,232 (vars*X*Y*T)

data3 = dap(URL = ex$URL,
           AOI = AOI::aoi_get(country= "Netherlands"),
           startDate = "2000-10-29")
#> source:   http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg... 
#> varname(s):
#>    > aet [mm] (water_evaporation_amount)
#> ==================================================
#> diminsions:  92, 66, 1 (names: lon,lat,time)
#> resolution:  0.042, 0.042, 1 months
#> extent:      3.29, 7.12, 50.79, 53.54 (xmin, xmax, ymin, ymax)
#> crs:         +proj=longlat +a=6378137 +f=0.00335281066474748 +p...
#> time:        2000-11-01 to 2000-11-01
#> ==================================================
#> values: 6,072 (vars*X*Y*T)

g = dplyr::filter(grids, grid_id == ex$grid_id)

par(mfrow = c(2,2))
plot(data1$aet, main = "Southern")
plot( AOI::aoi_get(country= "Chile")$geometry, add = TRUE)
plot(data2$aet, main = "Split Equator")
plot( AOI::aoi_get(country= "Kenya")$geometry, add = TRUE)
plot(data3$aet, main = "Northern")
plot( AOI::aoi_get(country= "Netherlands")$geometry, add = TRUE)
plot(var.get.nc(open.nc(ex$URL), g$Y_name), main = g$toptobottom)

Created on 2022-06-14 by the reprex package (v2.0.1)

WRF-Samoa (Southern Hemisphere)

library(opendap.catalog)
#> 
#> Attaching package: 'opendap.catalog'
#> The following object is masked from 'package:base':
#> 
#>     search
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(AOI)
library(terra)
#> terra 1.5.34
library(terra)
library(RNetCDF)

ex = params[grepl("wrf_samoa", params$URL),][1,]
g = dplyr::filter(grids, grid_id == ex$grid_id)
as = sf::read_sf('/Users/mjohnson/Downloads/tl_2017_60_place/tl_2017_60_place.shp')

data1 = dap(URL = ex$URL, AOI = as, startDate = "2013-10-29 00:00", varname = "Tair")
#> source:   https://pae-paha.pacioos.hawaii.edu/erddap/griddap/wrf_samoa 
#> varname(s):
#>    > Tair [Celsius] (air temperature at 2m)
#> ==================================================
#> diminsions:  67, 90, 1 (names: longitude,latitude,time)
#> resolution:  0.028, 0.027, 1 hours
#> extent:      -171.16, -169.36, -14.43, -11.93 (xmin, xmax, ymin, ymax)
#> crs:         +proj=longlat +a=6378137 +f=0.00335281066474748 +p...
#> time:        2013-10-29 to 2013-10-29
#> ==================================================
#> values: 6,030 (vars*X*Y*T)

par(mfrow = c(1,2))
plot(data1[[1]], main = "American Samoa")
plot(as$geometry, add = TRUE)
plot(var.get.nc(open.nc(ex$URL), g$Y_name), main = g$toptobottom)

Created on 2022-06-14 by the reprex package (v2.0.1)

Hope this helps!