Closed bekozi closed 7 years ago
I added an image with catchments and streams, along with graphs of (real) streamflow and ET data from August. The highlighted stream and the graph lines use the same cornflower blue as in the simple geometry image to the left. The simple geom images use similar colors to the USGS and NOAA logos in the top left, so I used orange for the catchments to be similar to the UT logo in the top right. Orange is also a complementary color to blue so it makes the image pop. I suggest using a color other than black for the leader lines, graph outlines, and map outline, but that can come later.
How does that look for the middle box? What else do you want in there? Another example, some text, or make the existing example bigger?
What else do you need help with? I was thinking of filling in text for "What is a simple geometry" next.
I just attempted to work up the CDL example. This seem reasonable? Totally draft, please help make better.
The use of 'instance' should be explained some where else on the slide. I have a hard time describing 'instance' versus 'element'. Maybe... "each simple feature is an instance that we describe with element variables"
How does that look for the middle box? What else do you want in there? Another example, some text, or make the existing example bigger?
@twhiteaker: Thanks for putting the image together. It looks good, and I agree the color combination is nice. I think we should add a point example (stream gauge, infrastructure). With that, this should be sufficient for a real-world example. Is it possible to put values on the streamflow and evapotranspiration plots? Makes it look more "realistic". Not necessary if difficult.
What else do you need help with? I was thinking of filling in text for "What is a simple geometry" next.
@twhiteaker: Yes, please tackle the simple geometry section. I will work on the text in the other sections.
I just attempted to work up the CDL example. This seem reasonable? Totally draft, please help make better.
@dblodgett-usgs: Graphical layout is excellent with the highlighting and arrows + boxes. I thought we were going to link the CDL example with hydrologic catchment data used in the data graphic? No big deal as this looks sufficient as an explanatory graphic. I think we should try and work in a multi-geometry example here as this is a confusing bit. Is that simple for you to do with your graphic? I noticed the geom_type
was multipolygon
in the CDL anyway.
We should stick the full Bull Creek CDL with data variables below the hydrologic catchment graphic in any case. I'll add that. It will also fill out the space I expect.
The use of 'instance' should be explained some where else on the slide. I have a hard time describing 'instance' versus 'element'. Maybe... "each simple feature is an instance that we describe with element variables"
I'm not sure we should adopt the DSG lingo if we are trying to move towards OGC. I think instance ~= feature
and element ~= data
. Is this true? But yeah, a paragraph on relationships to DSG makes sense.
Are we trying to move toward OGC? I would much rather focus on CF adoption. Yeah. Instances are the features, so any variable defined on the instance dimension only is feature attribute data. Elements are those variables defined on temporal or other element dimensions as well as the instance dimension.
I'll work up a multipolygon example if you guys think it's not going to be too confusing. I was sticking to a hole just to make it a simple demonstration of the contiquous_ragged_dimension details.
Are we trying to move toward OGC? I would much rather focus on CF adoption.
I think we should provide crosswalks with OGC at all times at the very least. CF adoption is most important but that doesn't mean compromising unnecessarily - using instance/element is not that much of a compromise.
Yeah. Instances are the features, so any variable defined on the instance dimension only is feature attribute data. Elements are those variables defined on temporal or other element dimensions as well as the instance dimension.
I have a couple questions regarding the use of instance identifiers. We can talk about them at a later date, but I am trying to wrap my head around this approach.
instance_id
attribute? It's fine when the geometry count is constant across coordinate index variables, but what happens when there are, say, four catchments with seven gauges?I'll work up a multipolygon example if you guys think it's not going to be too confusing. I was sticking to a hole just to make it a simple demonstration of the contiquous_ragged_dimension details.
I'm looking more closely at your example. I just now noticed that it is for a holed polygon and not a polygon on green background. :no_mouth: With that in mind, I don't think we need a multi-polygon example since you are demonstrating the use of break values.
A couple other questions:
start_index=1
. Python is zero-based. R is one-based.How does this work with multiple geometries in a single NetCDF group? Are there multiple variables with the instance_id attribute? It's fine when the geometry count is constant across coordinate index variables, but what happens when there are, say, four catchments with seven gauges? All geometry coordinate index variables will require a unique identifier variable?
By my understanding of the DSG spec, this would be handled by putting things in separate files. The reason for this is that the 'featureType' attribute is global. Having multiple featureTypes present in the same file would add quite a bit of complexity. For the purposes of our CF 1.* compatible proposal, I think we should probably stick to that model. That said, I do think putting the 'geometry_type' attribute in the coodinate_index rather than the global attributes, is a good idea to allow for multiple geometry_types in a single file (e.g. watersheds polygons and their associated outlet locations).
Are instance identifier variables always strings? Can they be integer data types?
They don't HAVE to be strings. This is a common practice though and is a nice way to embed identifiers of any type generically rather than requiring coercion to int or some other data type. The code I've worked up uses strings but I'm hoping to loosen that up as I work forward.
The indexing is one-based. This should be indicated on the coordinate index variable with start_index=1. Python is zero-based. R is one-based.
Good call. I missed this in the spec draft. Will add it.
While I'm thinking about it, for the contiguous ragged array indexing, I found it helpful to make a distinction between things that index into the contiguous_ragged_dimension and things that index into the coordinate dimension. In my code I've named variables with 'ind' for things in the contiguous ragged dimension and 'coord' in the coordinate dimension. My naming was arbitrary. My point here is that the distinction between the two kinds of indexing is really critical and is pretty easy to stumble over unless you are really explicit about how you talk about them. Just something to think about in the poster.
@dblodgett-usgs what do you think of adding x- and y-axes to the CDL example geometry so that users can more easily find the coordinates described in the CDL?
Good idea. But not super easy to do. I just screenshot a GIS rendering of the shapefile. Could we hack it in by hand?
If you send me the shapefile I could do this in ArcGIS.
Here it is.
Added axis labels to the graphs. Is the font size (10) too small? I didn't think the labels were important enough to make them as big as other fonts on the poster.
Thanks for the explanations, @dblodgett-usgs.
By my understanding of the DSG spec, this would be handled by putting things in separate files. The reason for this is that the 'featureType' attribute is global. Having multiple featureTypes present in the same file would add quite a bit of complexity. For the purposes of our CF 1.* compatible proposal, I think we should probably stick to that model. That said, I do think putting the 'geometry_type' attribute in the coodinate_index rather than the global attributes, is a good idea to allow for multiple geometry_types in a single file (e.g. watersheds polygons and their associated outlet locations).
I would really, really, really like to make the spec compatible with different geometry counts per data/element variable. It's probably best to avoid the issue with the poster directly provided the point examples have the same count as the polygons and stream segments. If we do not want to propose this, we should at least make sure that it can be proposed during the next "version". It may be as easy as adding an instance_dimension
/geom_dimension
to the coordinate index variable.
And, yes, definitely keep the geometry type out of the global attributes.
While I'm thinking about it, for the contiguous ragged array indexing, I found it helpful to make a distinction between things that index into the contiguous_ragged_dimension and things that index into the coordinate dimension. In my code I've named variables with 'ind' for things in the contiguous ragged dimension and 'coord' in the coordinate dimension. My naming was arbitrary. My point here is that the distinction between the two kinds of indexing is really critical and is pretty easy to stumble over unless you are really explicit about how you talk about them. Just something to think about in the poster.
Interesting to think about. The Python code uses an object to translate in and out of CRAs and mostly relies on variable-length unless reading/writing.
@dblodgett-usgs I added coordinate grid to the CDL polygon screenshot. How does it look?
Added axis labels to the graphs. Is the font size (10) too small? I didn't think the labels were important enough to make them as big as other fonts on the poster.
Small font is fine. There are ways to read it if someone is desperate. Looks like real data now. Thanks! :smile:
FYI, Grid labels in CDL polygon screenshot are Arial 24, RGB (110, 110, 110)
Added simple geometry text. I talk about "features" in there, so we may want to harmonize that with whatever you all decide to use for feature/instance/element.
There's a bullet about what multiparts are for. Not sure if this is important enough to get a bullet, but I think it's something the CF-metadata readers were a little confused about.
Added some fake points to the catchment/river screenshot. Soil moisture is from NLDAS.
Since the graphs look like real data now, I added the data source just under the graph title.
:+1:
Added Bull Creek CDL to the poster. It captures multiple geometries with time-varying data variables. It differs slightly from @dblodgett-usgs's CDL which uses instance identifiers. I think it's okay to have the different approaches. We can use these examples when deciding on draft spec. It's open for editing now of course.
@dblodgett-usgs: Were you planning to add text for the CRA v. VLen? I think you moved your CDL graphic around a bit.
P.S. Does anyone know the CF standard names for streamflow, evapotranspiration, and soil moisture?
I think it would be helpful to keep the bull creek example 'conceptual' and give a more basic netcdf3 example on the poster. I've got a list of things to comment about the Bull Creek CDL, but not sure that's worth providing right now since this is a NetCDF-4 VLEN example.
I could draft the text for CRA/VLEN, but think one of you might be better. I'm not familiar with VLEN at all since I'm focused on a CF1.* spec addition.
The Bull Creek CDL has upstream node of river segments instead of the three fake soil moisture stations. I would also remove the GNIS_Name and AreaSqKm variables to simplify things. Ah, and then there's Dave's comment above about using a more basic example.
I suggest:
I think adding a section in the top left briefly summarizing what we're doing would be a nice lead in the story that unfolds naturally from top left to bottom right. Otherwise, the poster doesn't seem to inform the user of what we're doing until bottom middle. This would require some shuffling of the sections around.
@twhiteaker I'd be happy to do multipolygon for the example. I'll make you the shapefile and switch the CDL in a bit. Should we not do hole then?
Either a hole or a multipart is good for demonstrating break values. A second geometry is good for demonstrating the coordinate index stop. Let's start with a hole and a second geometry. If there's room and the poster and time, I might play with adding a second part to the first geometry. I don't think it would make the CDL too complex, but I don't think it's vital either.
OK if we leave the hole off? It doesn't really add anything. Here's what I have so far:
netcdf demoPoly {
dimensions:
char = 1 ;
instance = 2 ;
coordinate_index = 10 ;
coordinates = 10 ;
variables:
char instance_name(instance, char) ;
instance_name:units = "unknown" ;
instance_name:standard_name = "instance_id" ;
int coordinate_index(coordinate_index) ;
coordinate_index:long_name = "ragged index for coordinates and geometry break values" ;
coordinate_index:geom_coordinates = "x y" ;
coordinate_index:multipart_break_value = -1 ;
coordinate_index:start_index = 1 ;
coordinate_index:hole_break_value = -2 ;
coordinate_index:outer_ring_order = "anticlockwise" ;
coordinate_index:closure_convention = "last_node_equals_first" ;
coordinate_index:geom_type = "multipolygon" ;
int coordinate_index_stop(instance) ;
coordinate_index_stop:long_name = "index for last coordinate in each instance geometry" ;
coordinate_index_stop:contiguous_ragged_dimension = "coordinate_index" ;
double x(coordinates) ;
x:units = "degrees_east" ;
x:standard_name = "geometry x node" ;
double y(coordinates) ;
y:units = "degrees_north" ;
y:standard_name = "geometry y node" ;
// global attributes:
:Conventions = "CF-1.8" ;
data:
instance_name =
"1",
"2" ;
coordinate_index = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ;
coordinate_index_stop = 5, 10 ;
x = 35, 10, 15, 30, 35, 30, 10, 20, 30, 30 ;
y = 25, 20, 25, 30, 25, 10, 15, 20, 20, 10 ;
}
I just drafted text for the CRA/VLEN section. PLEASE edit and cut it down... I think it's covering all the bases... maybe too many bases?
The Bull Creek CDL has upstream node of river segments instead of the three fake soil moisture stations.
@twhiteaker: I'll modify the CDL as this will demonstrate varying geometry lengths.
If you're going to include a Bull Creek CDL, make it just for streamflow for river lines. The idea is to keep the example simple enough that folks can grasp it quickly so they can discuss it with us (Dave) while looking at the poster. The Bull Creek example shows how simple geometry can represent features associated with the data variables, which the simple example green polygon example on the right doesn't show.
I modified the CDL to link up with the graphic. I removed the identifiers but left the data variables in. I think the example is relatively straightforward now. Let me know if you think it is still cluttered.
I think adding a section in the top left briefly summarizing what we're doing would be a nice lead in the story that unfolds naturally from top left to bottom right. Otherwise, the poster doesn't seem to inform the user of what we're doing until bottom middle. This would require some shuffling of the sections around.
I agree the organization is a little wonky, and we need an introduction. Let's get the content set and then do a reorg.
I just drafted text for the CRA/VLEN section. PLEASE edit and cut it down... I think it's covering all the bases... maybe too many bases?
Thanks, @dblodgett-usgs. I edited the text down a bit. Good overview.
@bekozi you could simplify the Bull Creek CDL by removing the break value attributes since there aren't any for this case. The colors and white space to separate point/line/polygon are great! Can we drop the _NCProperties global attribute?
@dblodgett-usgs If we're going to mention break values in the text and have those attributes show up in CDL, I think we should illustrate their usage. If we don't illustrate their usage, I think we can leave them out of the text to simplify the story. I do think showing multiple geometries is more important than showing multiparts or holes.
Consider using a color other than pure black for the text highlighting. I guess it depends on your printer, but I've seen some printers lay down too much ink for big black areas so that the paper looks shiny, or it's too wet and crumples, or the black bleeds.
@bekozi you could simplify the Bull Creek CDL by removing the break value attributes since there aren't any for this case. The colors and white space to separate point/line/polygon are great! Can we drop the _NCProperties global attribute?
Done. _NCProperties
is now added by default when using netCDF4-python
. Not sure if that's a Python or NetCDF thing. Either way, easy to drop.
Here's the grid for Dave's latest geometries, though including holes or multis is still under discussion.
@dblodgett-usgs In the contiguous section, I combined the first three bullets into a single bullet because I think they are making a single point which is "Different coordinate counts present a challenge for efficient storage in netCDF." I combined the last bullet and sub-bullet into a single bullet (I think a lone sub-bullet looks...lonely). I replaced continuous with contiguous.
@bekozi Streamflow standard name is water_volume_transport_in_river_channel. The soil moisture data I used was for the 0-100cm layer, so the standard name is moisture_content_of_soil_layer. There is no standard name that I'm aware of for evapotranspiration.
Thanks. I added standard names and units.
Oh man... I totally forgot that the standard names got in there... I had a hand in that too!
Didn't make the connection that we need multiple geometries AND a multipart. I'll add that now.
The pure black highlighting was for lack of any other ideas. Happy to change.
Better?
netcdf demoPoly {
dimensions:
char = 1 ;
instance = 2 ;
coordinate_index = 16 ;
coordinates = 15 ;
variables:
char instance_name(instance, char) ;
instance_name:units = "unknown" ;
instance_name:standard_name = "instance_id" ;
int coordinate_index(coordinate_index) ;
coordinate_index:long_name = "ragged index for coordinates and geometry break values" ;
coordinate_index:geom_coordinates = "x y" ;
coordinate_index:multipart_break_value = -1 ;
coordinate_index:start_index = 1 ;
coordinate_index:hole_break_value = -2 ;
coordinate_index:outer_ring_order = "anticlockwise" ;
coordinate_index:closure_convention = "last_node_equals_first" ;
coordinate_index:geom_type = "multipolygon" ;
int coordinate_index_stop(instance) ;
coordinate_index_stop:long_name = "index for last coordinate in each instance geometry" ;
coordinate_index_stop:contiguous_ragged_dimension = "coordinate_index" ;
double x(coordinates) ;
x:units = "degrees_east" ;
x:standard_name = "geometry x node" ;
double y(coordinates) ;
y:units = "degrees_north" ;
y:standard_name = "geometry y node" ;
// global attributes:
:Conventions = "CF-1.8" ;
data:
instance_name =
"1",
"2" ;
coordinate_index = 1, 2, 3, 4, 5, -2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ;
coordinate_index_stop = 11, 16 ;
x = 35, 26, 25, 30, 35, 22, 22, 15, 10, 22, 30, 10, 20, 30, 30 ;
y = 25, 23, 28, 30, 25, 22, 27, 25, 20, 22, 10, 15, 20, 20, 10 ;
}
Looking good. Shouldn't the spaces in the node standard names be replaced with underscores?
Ahhh yeah. I've got some updates to do in the R reference implementation that I haven't gotten to yet.
I'd delete instance_name:units attribute.
From our VLEN in NetCDF 3 wiki page, there's this quote about the stop index:
the stop index (1 past the last index) for each VLEN chunk.
This is different than Dave's example, which puts the stop value at the last index instead of 1 past the last index, given the coordinate_index:start_index value of 1. This was clearly intentional given this attribute: coordinate_index_stop:long_name = "index for last coordinate in each instance geometry"
The CRA examples follow Dave's convention (one based, stop on last index), whereas the readme seems to be zero based and stopping one past the last index.
I don't care which way we do it, but we had better be clear and consistent about it.
...Well, maybe I do care. Stopping one past the last index is more Python friendly, but I like stopping at the last index for human readability which I think is more in line with CF.
Grid for two poly one multi.
@dblodgett-usgs change -2 to -1 in your data.
Jeez I'm lazy. Ok, here's what I'm suggesting. Also, I took out the hole break value since no holes. And I fixed the coordinates so that they were all anticlockwise...by hand, so hopefully it's right.
dimensions:
char = 1 ;
instance = 2 ;
coordinate_index = 16 ;
coordinates = 15 ;
variables:
char instance_name(instance, char) ;
instance_name:standard_name = "instance_id" ;
int coordinate_index(coordinate_index) ;
coordinate_index:long_name = "ragged index for coordinates and geometry break values" ;
coordinate_index:geom_coordinates = "x y" ;
coordinate_index:multipart_break_value = -1 ;
coordinate_index:start_index = 1 ;
coordinate_index:outer_ring_order = "anticlockwise" ;
coordinate_index:closure_convention = "last_node_equals_first" ;
coordinate_index:geom_type = "multipolygon" ;
int coordinate_index_stop(instance) ;
coordinate_index_stop:long_name = "index for last coordinate in each instance geometry" ;
coordinate_index_stop:contiguous_ragged_dimension = "coordinate_index" ;
double x(coordinates) ;
x:units = "degrees_east" ;
x:standard_name = "geometry_x_node" ;
double y(coordinates) ;
y:units = "degrees_north" ;
y:standard_name = "geometry_y_node" ;
// global attributes:
:Conventions = "CF-1.8" ;
data:
instance_name =
"1",
"2" ;
coordinate_index = 1, 2, 3, 4, 5, -1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ;
coordinate_index_stop = 11, 16 ;
x = 35, 30, 25, 26, 35, 22, 22, 15, 10, 22, 30, 30, 20, 10, 30 ;
y = 25, 30, 28, 23, 25, 22, 27, 25, 20, 22, 10, 20, 20, 15, 10 ;
}
After playing with several colors for highlighting, I didn't find any of them harmonious with the rest of the poster. Color might also be confusing since we use color to associate sections of CDL with features in the map in the other CDL example. In the end I just lightened the black a bit.
Opps... @twhiteaker - My code is fine, the WKT I created that gets read in is encoded wrong. The second polygon is encoded as a hole that is outside the first polygon!
Template is up: https://docs.google.com/drawings/d/1zwJTWQ9uOkuLxTnDNBdKLlVWDUFoIQ2UcE89P9UzppI/edit.
Please edit as you see fit. I am impressed with Google Drawings for this sort of thing. If we can't get things quite aligned, we can export and refine. Otherwise, I recommend we just continue to use this. Google has PDF and SVG export options.