Open rod-glover opened 7 years ago
It is possible that generate_climos
can be tweaked to handle this case. A code inspection generate_climos
shows the following observations relevant to processing streamflow output files:
gc
selects the temporal subset with cdo seldate
, which:
drops dimension nc_chars
renames dimension nv
to bnds
renames dimension outlet
to x
drops the attribute _ChunkSizes
from all variables with dimension outlet
/x
, but retains it for time
and streamflow
with altered value for time
(1024 -> 524288)
drops variable outlet_name(outlet)
retains all other variables in apparently correct state
gc
forms the means using cdo ymonmean
(etc.), which:
can tell the difference between variables dependent on time and those that are not, so takes the mean only of streamflow
drops dimension bnds
(formerly nv
)
drops variable time_bnds(time, bnds)
-- which may be OK, since gc
replaces that variable anyway to reflect the bounds for climatological means
retains all other variables in apparently correct state
Not sure what concatenating intervals (via cdo copy
) would do. I think it would be OK. In any case, since we do not want 17-point chronologies any more, this is not relevant.
Converting longitudes (--convert-longitudes
) depends on cf.lon_var
, which fails for streamflow files. This is presumably fixable.
Units conversion for pr
variables is irrelevant.
gc
will split out all dependent variables if --split-variables
option is set. This should be changed to splitting out only time-dependent variables. In the particular case of streamflows, this is in practice unnecessary, since there is only one time-dependent variable and so we could get away with unsetting --split-variables
. I'd prefer to do this right, which would call for a change to nchelpers
to return a list of time-dependent variables. Actually executing on that will depend on how complicated it looks -- I'm thinking not so much since nchelpers
can already identify the time dimension (or variable, rather, but that is very close).
Proposed fixes to above issues:
Every cdo
operator apparently does the following things:
drops dimension nc_chars
renames dimension nv
to bnds
renames dimension outlet
to x
drops the attribute _ChunkSizes
from all variables with dimension outlet
/x
, but retains it for time
and streamflow
with altered value for time
(1024 -> 524288)
drops variable outlet_name(outlet)
cdo
has a bug that prevents it from recognizing this variable, because when it is specified in a cdo select
command, it complains that it can't find a variable of that name. Weird. Direct inspection of the file using the Python netCDF4
package shows that variable is defined like all others; Panoply agrees. WTF?Therefore corrections to these issues must be applied after all cdo
operators have been applied. The corrections are:
Rename dimension x
to outlet
(netCDF4.Dataset.renameDimension
)
nc_chars
, outlet_name
: Create new dimension nc_chars
and variable outlet_name(outlet, nc_chars)
and copy values from input file
? Copy attribute _ChunkSizes
back onto all variables with dimension outlet
Modify creation/updating of time bounds variable as needed to work in this context.
Other fixes:
Time-dependent variables: Extend nchelpers.CFDataset.dependent_varnames
to be able to return names of variables dependent on a specified set of dimensions (specifically, time).
Convert longitudes: Fix nchelpers.CFDataset.lon_var
.
When splitting, must include all non-time-dependent variables to be included in the split file, otherwise they are dropped. So the split command looks like cdo select,name={all non-time-dependent vars},{time-dependent var}
for each time-dependent var.
Starting to look into this issue.
Some rather old discussion around issues people were having with CDO copy renaming and removing variables seems to indicate that CDO ignores variables it thinks don't actually describe the data, and renames variables it thinks aren't compliant with CF standards.
The CF Standards contain guidelines on how to represent "discrete geometries" (for example, stations with associated timeseries) and sets of "station variables". I thought perhaps CDO wasn't able to understand that the streamflow files represent discrete geometries, and that might be why it was renaming and deleting things, so I wrote a script to do the minor modifications to bring our files up to the standard (add cf_role
attributes, etc).
Unfortunately, this doesn't actually seem to matter. CDO is still renaming outlets
to x
and completely dropping outlet_name
.
Much newer discussion indicates that more recent versions of CDO don't ever rename dimensions. Perhaps they also understand the (relatively recently developed) CF standards for discrete geometries? Unfortunately, the prebuilt cdo versions available on Ubuntu right now are all 2+ years old, and I don't think making "build CDO from source" a development requirement on this project is justifiable.
Summary: Modifying data files to see if CDO stops deleting dimensions when everything is perfectly CF-Standards-compliant seems to have been a dead end. It looks like correcting the data after all the CDO operations are complete is the way to go.
Upon further investigation, CDO ignores any variable with type character
( a netCDF "classic" string). CDO deletes thenc_chars
axis because no variable (that it cares about, anyway) is using it.
So it looks like there are two categories of CDO weirdness we should be able to detect and correct after calculations are done:
character
variables and deletes axes that are only used by themPlus whatever the issue with _ChunkSizes
is; none of my test files seem to have _ChunkSizes
attributes so I haven't really done any testing of it.
Currently we can form climatological means from files containing variables defined over spatiotemporal grids, such as the outputs of GCMs, but not from streamflow output files.
Streamflow, however, is not defined on a grid. A streamflow for a given spatial location is a time series at that location, called an outlet. The collection of outlets do not form a uniform grid -- instead they are distributed essentially at random. Outlets are addressed by an outlet index, with several dependent variables defining the spatial location, name, and streamflow at that outlet.
We need to handle this case too.