Closed rarutter closed 7 years ago
Had this issue listed as part of the IO-refactor milestone, but removing for the time being. We will address this later.
Restarting work on defining output NetCDF files.
Helene mentioned that there is a risk of data loss when using NCOs on NetCDF files larger than 12GB (she's pretty certain of that number)
Fewer files. More complex dimension/variable structure.
Guarantees small files. Could make it simpler for adding new variables in the future.
Companion variables: Lat/lon year/month/day The human must manually use this to analyze the variable files based on the timestep specified in the variable file name Layer type (x,y,time,layer) Time since disturbance (age, etc)
Single file per variable, with the finest resolution specified.
Need output files from each stage. Not PR. DONE
Needs setting for whether or not to output NetCDF at all. Per-stage option is implemented, excluding PR.
Dimension order. NetCDF examples tend to have time first (necessary, since it's unlimited), followed by other dimensions (in our case would be layer, pft, pftpart), and then y and x. There seems to be no specific NetCDF standard requirement for this, though, and since all variables have time,y,x, it's more logical to list those first. Correction: the CF conventions say to list extra dimensions before the spatiotemporal dimensions. Unlimited dimensions must still be first (I think), so for VegC, dimensions would be time, PFTpart, PFT, y, x. See discussion below.
Fill_Value is incredibly important! Pick something not at all feasible. There are sometimes issues if you specify fill value and missing value and the values are not the same. Default to always specifying fill value
For some variables (e.g. soil carbon in unfrozen layers), compute on the run based on actual layer properties. Keep in mind that ALD would mess this up, as top layers may be frozen with thawed layers underneath (in the fall especially)
PFT compartment should be specifiable by itself, without PFT. For example, looking at solely root info. DONE
Restrict layer outputs to layers that are relevant (i.e. no snow layers for Soil C).
For now, output simply first and last snow days in a year. At some point, need to implement a smarter method to tell when the snow is actually sticking (~a week)
In changing from time-major to space-major, how does that change file writing/access? If space-major needs x and y to be the unlimited dimensions, that changes the entire structure. Needs testing.
a. Think about outputing companion variables like layer type of layer outputs or pft type for pft outputs
b. Think about locking wrong combinations of dimensions for each variables (e.g. monthly ALD means nothing... why not?)
c. Think about making that list flexible in length so that one can add variables if needed. Done for file creation. Writing data to the file must still be written by hand.
d. Think about variables names being consistant with code variable names. Hah, good luck
e. Think about grouping the outputs by veg/soil bgc/env annual/monthly
f. Think about how to merge outputs from single grid cells into multiple grid cells files (parallelization)
re: dimension order, I agree that there is not NetCDF specific standard for order, but the CF conventions call for (time, z, y, x), and recommend that if there are more dimensions, they should be added to the left: http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#dimensions
Ah, so they do (of course the COARDS standards has them completely opposite...). It looks like that's because they expect longitude to be the fastest changing. No reason not to stick with that, I think. I'll get those switched around.
Thanks Tobey for checking this. Having the time, y,x as first dimensions and the others like PFT,layer and compartment following those makes lots of sense.
Helene
Hélène Genet, PhD Institute of Arctic Biology University of Alaska Fairbanks 902 Koyukuk Drive Fairbanks, AK 99775 - USA Phone: +1 907 699 4340 Email: hgenet@alaska.edu Skype: helenegenet
On 1/8/17 7:32 AM, tobey wrote:
re: dimension order, I agree that there is not NetCDF specific standard for order, but the CF conventions call for (time, z, y, x), and recommend that if there are more dimensions, they should be added to the left: http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#dimensions
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ua-snap/dvm-dos-tem/issues/44#issuecomment-271161784, or mute the thread https://github.com/notifications/unsubscribe-auth/ABtUl_qtT-0cFyNBLEYL4HMCOeC35ko0ks5rQQ-LgaJpZM4B0hQu.
Wait, just to make sure we are on the same page here: I am seeing both the COARDS and CF standards calling for (time, z, y, x). The CF standards are an extension of the COARDS standards, so I would be surprised if they disagreed here.
CF: http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#dimensions COARDS: http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html
Also both standards indicate that any further dimensions should be to the left of the spatiotemporal dimensions, so something like this for PFT specific GPP:
float gpp(pft, time, y, x);
And this might be a definition for a per-pft, per compartment variable:
float VegC(compartment, pft, time, y, x);
I guess for a soil variable, we should use the depth dimension, so something like this:
float temperature(time, depth, y, x);
Does this match what you both have been thinking, or am I reading those documents incorrectly?
Those examples show what I understood from the first link (cfconventions dimensions). I followed the link from there to http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#coards-relationship which indicates an ordering of lon, lat, vertical, and time. It does seem odd that CF and COARDS wouldn't agree, so perhaps I'm not reading it right?
Huh, that is strange. I agree, in the link you just sent describing the relationship between COARDS and CF, it says that the old COARDS standard was "lon, lat, vertical, time", but if you go to the COARDS webpage, they specify "time, height, lat, lon"...
Going back to one of your first comments, re: time necessarily being first due to it being unlimited, I am not really sure how to handle that in cases where we have more than 4 dimensions?? Any idea?
I didn't had time to read through the documentation yet, but one thing we might have to consider as well is the unlimited nature of some of those dimensions. Unlimited dimensions need to be listed first. But the thing is that these unlimited dimensions might change depending on if we run the time loop or the space loop first...
Helene
Hélène Genet, PhD Institute of Arctic Biology University of Alaska Fairbanks 902 Koyukuk Drive Fairbanks, AK 99775 - USA Phone: +1 907 699 4340 Email: hgenet@alaska.edu Skype: helenegenet
On 1/9/17 12:10 PM, tobey wrote:
Huh, that is strange. I agree, in the link you just sent describing the relationship between COARDS and CF, it says that the old COARDS standard was "lon, lat, vertical, time", but if you go to the COARDS webpage, they specify "time, height, lat, lon"...
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ua-snap/dvm-dos-tem/issues/44#issuecomment-271408767, or mute the thread https://github.com/notifications/unsubscribe-auth/ABtUl84wfvEhSxT_N89aqE1tAar0TVIXks5rQqJdgaJpZM4B0hQu.
Most of the points in this thread are addressed by PR #257 . The rest have been moved to Issue #258
Standardize use of _FillValue.