Open asinghvi17 opened 6 days ago
What's in the spec? I thought the grid_mapping variable was a key that linked to a shared crs string for the whole dataset.
Edited after reading the spec
What happens if that crs
variable doesn't exist?
The spec does say it's supposed to be a variable: https://cfconventions.org/cf-conventions/cf-conventions.html#grid-mappings-and-projections
but this particular dataset does not have that variable (though it does have a global proj4 field). It shouldn't error on load at least.
Here's an overview of the dataset, for reference:
julia> zg = zopen("reference://nwm_reanalysis.json", "r")
ZarrGroup at ReferenceStore with 2939545 references
and path
Variables: time elevation qBtmVertRunoff q_lateral qSfcLatRunoff streamflow feature_id velocity latitude qBucket order longitude
julia> zg["velocity"].attrs
Dict{String, Any} with 10 entries:
"missing_value" => -999900
"coordinates" => "longitude latitude"
"units" => "m s-1"
"add_offset" => 0.0
"_ARRAY_DIMENSIONS" => Any["time", "feature_id"]
"long_name" => "River Velocity"
"grid_mapping" => "crs"
"scale_factor" => 0.01
"_Netcdf4Dimid" => 0
"valid_range" => Any[0, 5000000]
julia> zg.attrs
Dict{String, Any} with 19 entries:
"TITLE" => "OUTPUT FROM WRF-Hydro v5.2.0-beta2"
"code_version" => "v5.2.0-beta2"
"stream_order_output" => 1
"proj4" => "+proj=lcc +units=m +a=6370000.0 +b=6370000.0 +lat_1=30.0 +lat_2=60.0 +lat_0=40.0 +lon_0=-97.0 +x_0=0 +y_0=0 +k_0=1.0 +…
"cdm_datatype" => "Station"
"model_initialization_time" => "1979-02-01_00:00:00"
"model_output_valid_time" => "1979-02-01_01:00:00"
"dev_channelBucket_only" => 0
"model_configuration" => "retrospective"
"featureType" => "timeSeries"
"dev_channel_only" => 0
"Conventions" => "CF-1.6"
⋮ => ⋮
julia> zg.attrs |> println
Dict{String, Any}("TITLE" => "OUTPUT FROM WRF-Hydro v5.2.0-beta2", "code_version" => "v5.2.0-beta2", "stream_order_output" => 1, "proj4" => "+proj=lcc +units=m +a=6370000.0 +b=6370000.0 +lat_1=30.0 +lat_2=60.0 +lat_0=40.0 +lon_0=-97.0 +x_0=0 +y_0=0 +k_0=1.0 +nadgrids=@", "cdm_datatype" => "Station", "model_initialization_time" => "1979-02-01_00:00:00", "model_output_valid_time" => "1979-02-01_01:00:00", "dev_channelBucket_only" => 0, "model_configuration" => "retrospective", "featureType" => "timeSeries", "dev_channel_only" => 0, "Conventions" => "CF-1.6", "model_total_valid_times" => 1416, "station_dimension" => "feature_id", "dev_NOAH_TIMESTEP" => 3600, "dev_OVRTSWCRT" => 1, "_NCProperties" => "version=2,netcdf=4.7.4,hdf5=1.10.7,", "model_output_type" => "channel_rt", "dev" => "dev_ prefix indicates development/internal meta data")
and here's an overview from ZarrDatasets:
julia> zd = ZarrDataset(zg)
Dataset:
Group: root
Dimensions
time = 367439
feature_id = 2776738
Variables
time (367439)
Datatype: Dates.DateTime (Int32)
Dimensions: time
Attributes:
units = minutes since 1970-01-01 00:00:00 UTC
NAME = time
long_name = valid output time
_Netcdf4Dimid = 1
standard_name = time
valid_min = 4777980
valid_max = 4862880
elevation (2776738 × 367439)
Datatype: Union{Missing, Float32} (Float32)
Dimensions: feature_id × time
Attributes:
units = meters
coordinates = longitude latitude
long_name = Feature Elevation
_Netcdf4Dimid = 0
standard_name = Elevation
_FillValue = 0.0
qBtmVertRunoff (2776738 × 367439)
Datatype: Union{Missing, Float64} (Int32)
Dimensions: feature_id × time
Attributes:
missing_value = -9999000
coordinates = longitude latitude
units = m3
add_offset = 0.0
long_name = Runoff from bottom of soil to bucket
grid_mapping = crs
scale_factor = 0.0010000000474974513
_Netcdf4Dimid = 0
_FillValue = 0
valid_range = Any[0, 20000000]
q_lateral (2776738 × 367439)
Datatype: Union{Missing, Float64} (Int32)
Dimensions: feature_id × time
Attributes:
missing_value = -99990
coordinates = longitude latitude
units = m3 s-1
add_offset = 0.0
long_name = Runoff into channel reach
grid_mapping = crs
scale_factor = 0.10000000149011612
_Netcdf4Dimid = 0
_FillValue = 0
valid_range = Any[0, 500000]
qSfcLatRunoff (2776738 × 367439)
Datatype: Union{Missing, Float64} (Int32)
Dimensions: feature_id × time
Attributes:
missing_value = -999900000
coordinates = longitude latitude
units = m3 s-1
add_offset = 0.0
long_name = Runoff from terrain routing
grid_mapping = crs
scale_factor = 9.999999747378752e-6
_Netcdf4Dimid = 0
_FillValue = 0
valid_range = Any[0, 2000000000]
streamflow (2776738 × 367439)
Datatype: Union{Missing, Float64} (Int32)
Dimensions: feature_id × time
Attributes:
missing_value = -999900
coordinates = longitude latitude
units = m3 s-1
add_offset = 0.0
long_name = River Flow
grid_mapping = crs
scale_factor = 0.009999999776482582
_Netcdf4Dimid = 0
_FillValue = 0
valid_range = Any[0, 5000000]
feature_id (2776738)
Datatype: Int32 (Int32)
Dimensions: feature_id
Attributes:
cf_role = timeseries_id
long_name = Reach ID
_Netcdf4Dimid = 0
comment = NHDPlusv2 ComIDs within CONUS, arbitrary Reach IDs outside of CONUS
NAME = feature_id
velocity (2776738 × 367439)
Datatype: Union{Missing, Float64} (Int32)
Dimensions: feature_id × time
Attributes:
missing_value = -999900
coordinates = longitude latitude
units = m s-1
add_offset = 0.0
long_name = River Velocity
grid_mapping = crs
scale_factor = 0.009999999776482582
_Netcdf4Dimid = 0
_FillValue = 0
valid_range = Any[0, 5000000]
latitude (2776738)
Datatype: Union{Missing, Float32} (Float32)
Dimensions: feature_id
Attributes:
units = degrees_north
long_name = Feature latitude
_Netcdf4Dimid = 0
standard_name = latitude
_FillValue = 0.0
qBucket (2776738 × 367439)
Datatype: Union{Missing, Float64} (Int32)
Dimensions: feature_id × time
Attributes:
missing_value = -999900000
coordinates = longitude latitude
units = m3 s-1
add_offset = 0.0
long_name = Flux from gw bucket
grid_mapping = crs
scale_factor = 9.999999747378752e-6
_Netcdf4Dimid = 0
_FillValue = 0
valid_range = Any[0, 2000000000]
order (2776738 × 367439)
Datatype: Union{Missing, Int32} (Int32)
Dimensions: feature_id × time
Attributes:
coordinates = longitude latitude
long_name = Streamflow Order
_Netcdf4Dimid = 0
standard_name = order
_FillValue = 0
longitude (2776738)
Datatype: Union{Missing, Float32} (Float32)
Dimensions: feature_id
Attributes:
units = degrees_east
long_name = Feature longitude
_Netcdf4Dimid = 0
standard_name = longitude
_FillValue = 0.0
Global attributes
TITLE = OUTPUT FROM WRF-Hydro v5.2.0-beta2
code_version = v5.2.0-beta2
stream_order_output = 1
proj4 = +proj=lcc +units=m +a=6370000.0 +b=6370000.0 +lat_1=30.0 +lat_2=60.0 +lat_0=40.0 +lon_0=-97.0 +x_0=0 +y_0=0 +k_0=1.0 +nadgrids=@
cdm_datatype = Station
model_initialization_time = 1979-02-01_00:00:00
model_output_valid_time = 1979-02-01_01:00:00
dev_channelBucket_only = 0
model_configuration = retrospective
featureType = timeSeries
dev_channel_only = 0
Conventions = CF-1.6
model_total_valid_times = 1416
station_dimension = feature_id
dev_NOAH_TIMESTEP = 3600
dev_OVRTSWCRT = 1
_NCProperties = version=2,netcdf=4.7.4,hdf5=1.10.7,
model_output_type = channel_rt
dev = dev_ prefix indicates development/internal meta data
Hm I thought grid_mapping = crs
meant there was a field called "crs" that held the crs.
Maybe there's a convention to name them e.g. proj4
etc to identify the type? I'm not sure this stuff is fully official. We need somewhere in writing what this should be.
This is on the
cf
branch.I tried to load a raster where
grid_mapping => "crs"
, but there's nocrs
field in the dataset, only aproj4
field. It also looks like the current code tries to load a variable in the dataset? somehow? Maybe that's a typo.Ended up hacking around it by simply replacing https://github.com/rafaqz/Rasters.jl/blob/c2258400942ae680e4706153f4189d1bc9ff419b/src/sources/commondatamodel.jl#L165-L173 with a set of checks on possible crs strings, and it doesn't error if it can't find it, only warns.
Is this a solution that makes sense, or should we just not do it? It doesn't actually read the CRS still.