twhiteaker / CFGeom

CF Convention for Representing Simple Geometry Types
MIT License
9 stars 4 forks source link

AGU 2016 Poster #40

Closed bekozi closed 7 years ago

bekozi commented 7 years ago

Template is up: https://docs.google.com/drawings/d/1zwJTWQ9uOkuLxTnDNBdKLlVWDUFoIQ2UcE89P9UzppI/edit.

Please edit as you see fit. I am impressed with Google Drawings for this sort of thing. If we can't get things quite aligned, we can export and refine. Otherwise, I recommend we just continue to use this. Google has PDF and SVG export options.

twhiteaker commented 7 years ago

I added an image with catchments and streams, along with graphs of (real) streamflow and ET data from August. The highlighted stream and the graph lines use the same cornflower blue as in the simple geometry image to the left. The simple geom images use similar colors to the USGS and NOAA logos in the top left, so I used orange for the catchments to be similar to the UT logo in the top right. Orange is also a complementary color to blue so it makes the image pop. I suggest using a color other than black for the leader lines, graph outlines, and map outline, but that can come later.

How does that look for the middle box? What else do you want in there? Another example, some text, or make the existing example bigger?

What else do you need help with? I was thinking of filling in text for "What is a simple geometry" next.

dblodgett-usgs commented 7 years ago

I just attempted to work up the CDL example. This seem reasonable? Totally draft, please help make better. screen shot 2016-11-25 at 7 15 54 am

dblodgett-usgs commented 7 years ago

The use of 'instance' should be explained some where else on the slide. I have a hard time describing 'instance' versus 'element'. Maybe... "each simple feature is an instance that we describe with element variables"

bekozi commented 7 years ago

How does that look for the middle box? What else do you want in there? Another example, some text, or make the existing example bigger?

@twhiteaker: Thanks for putting the image together. It looks good, and I agree the color combination is nice. I think we should add a point example (stream gauge, infrastructure). With that, this should be sufficient for a real-world example. Is it possible to put values on the streamflow and evapotranspiration plots? Makes it look more "realistic". Not necessary if difficult.

What else do you need help with? I was thinking of filling in text for "What is a simple geometry" next.

@twhiteaker: Yes, please tackle the simple geometry section. I will work on the text in the other sections.

I just attempted to work up the CDL example. This seem reasonable? Totally draft, please help make better.

@dblodgett-usgs: Graphical layout is excellent with the highlighting and arrows + boxes. I thought we were going to link the CDL example with hydrologic catchment data used in the data graphic? No big deal as this looks sufficient as an explanatory graphic. I think we should try and work in a multi-geometry example here as this is a confusing bit. Is that simple for you to do with your graphic? I noticed the geom_type was multipolygon in the CDL anyway.

We should stick the full Bull Creek CDL with data variables below the hydrologic catchment graphic in any case. I'll add that. It will also fill out the space I expect.

The use of 'instance' should be explained some where else on the slide. I have a hard time describing 'instance' versus 'element'. Maybe... "each simple feature is an instance that we describe with element variables"

I'm not sure we should adopt the DSG lingo if we are trying to move towards OGC. I think instance ~= feature and element ~= data. Is this true? But yeah, a paragraph on relationships to DSG makes sense.

dblodgett-usgs commented 7 years ago

Are we trying to move toward OGC? I would much rather focus on CF adoption. Yeah. Instances are the features, so any variable defined on the instance dimension only is feature attribute data. Elements are those variables defined on temporal or other element dimensions as well as the instance dimension.

dblodgett-usgs commented 7 years ago

I'll work up a multipolygon example if you guys think it's not going to be too confusing. I was sticking to a hole just to make it a simple demonstration of the contiquous_ragged_dimension details.

bekozi commented 7 years ago

Are we trying to move toward OGC? I would much rather focus on CF adoption.

I think we should provide crosswalks with OGC at all times at the very least. CF adoption is most important but that doesn't mean compromising unnecessarily - using instance/element is not that much of a compromise.

Yeah. Instances are the features, so any variable defined on the instance dimension only is feature attribute data. Elements are those variables defined on temporal or other element dimensions as well as the instance dimension.

I have a couple questions regarding the use of instance identifiers. We can talk about them at a later date, but I am trying to wrap my head around this approach.

I'll work up a multipolygon example if you guys think it's not going to be too confusing. I was sticking to a hole just to make it a simple demonstration of the contiquous_ragged_dimension details.

I'm looking more closely at your example. I just now noticed that it is for a holed polygon and not a polygon on green background. :no_mouth: With that in mind, I don't think we need a multi-polygon example since you are demonstrating the use of break values.

A couple other questions:

dblodgett-usgs commented 7 years ago

How does this work with multiple geometries in a single NetCDF group? Are there multiple variables with the instance_id attribute? It's fine when the geometry count is constant across coordinate index variables, but what happens when there are, say, four catchments with seven gauges? All geometry coordinate index variables will require a unique identifier variable?

By my understanding of the DSG spec, this would be handled by putting things in separate files. The reason for this is that the 'featureType' attribute is global. Having multiple featureTypes present in the same file would add quite a bit of complexity. For the purposes of our CF 1.* compatible proposal, I think we should probably stick to that model. That said, I do think putting the 'geometry_type' attribute in the coodinate_index rather than the global attributes, is a good idea to allow for multiple geometry_types in a single file (e.g. watersheds polygons and their associated outlet locations).

Are instance identifier variables always strings? Can they be integer data types?

They don't HAVE to be strings. This is a common practice though and is a nice way to embed identifiers of any type generically rather than requiring coercion to int or some other data type. The code I've worked up uses strings but I'm hoping to loosen that up as I work forward.

The indexing is one-based. This should be indicated on the coordinate index variable with start_index=1. Python is zero-based. R is one-based.

Good call. I missed this in the spec draft. Will add it.

While I'm thinking about it, for the contiguous ragged array indexing, I found it helpful to make a distinction between things that index into the contiguous_ragged_dimension and things that index into the coordinate dimension. In my code I've named variables with 'ind' for things in the contiguous ragged dimension and 'coord' in the coordinate dimension. My naming was arbitrary. My point here is that the distinction between the two kinds of indexing is really critical and is pretty easy to stumble over unless you are really explicit about how you talk about them. Just something to think about in the poster.

twhiteaker commented 7 years ago

@dblodgett-usgs what do you think of adding x- and y-axes to the CDL example geometry so that users can more easily find the coordinates described in the CDL?

dblodgett-usgs commented 7 years ago

Good idea. But not super easy to do. I just screenshot a GIS rendering of the shapefile. Could we hack it in by hand?

twhiteaker commented 7 years ago

If you send me the shapefile I could do this in ArcGIS.

dblodgett-usgs commented 7 years ago

Here it is.

sample.zip

twhiteaker commented 7 years ago

Added axis labels to the graphs. Is the font size (10) too small? I didn't think the labels were important enough to make them as big as other fonts on the poster.

bekozi commented 7 years ago

Thanks for the explanations, @dblodgett-usgs.

By my understanding of the DSG spec, this would be handled by putting things in separate files. The reason for this is that the 'featureType' attribute is global. Having multiple featureTypes present in the same file would add quite a bit of complexity. For the purposes of our CF 1.* compatible proposal, I think we should probably stick to that model. That said, I do think putting the 'geometry_type' attribute in the coodinate_index rather than the global attributes, is a good idea to allow for multiple geometry_types in a single file (e.g. watersheds polygons and their associated outlet locations).

I would really, really, really like to make the spec compatible with different geometry counts per data/element variable. It's probably best to avoid the issue with the poster directly provided the point examples have the same count as the polygons and stream segments. If we do not want to propose this, we should at least make sure that it can be proposed during the next "version". It may be as easy as adding an instance_dimension/geom_dimension to the coordinate index variable.

And, yes, definitely keep the geometry type out of the global attributes.

While I'm thinking about it, for the contiguous ragged array indexing, I found it helpful to make a distinction between things that index into the contiguous_ragged_dimension and things that index into the coordinate dimension. In my code I've named variables with 'ind' for things in the contiguous ragged dimension and 'coord' in the coordinate dimension. My naming was arbitrary. My point here is that the distinction between the two kinds of indexing is really critical and is pretty easy to stumble over unless you are really explicit about how you talk about them. Just something to think about in the poster.

Interesting to think about. The Python code uses an object to translate in and out of CRAs and mostly relies on variable-length unless reading/writing.

twhiteaker commented 7 years ago

@dblodgett-usgs I added coordinate grid to the CDL polygon screenshot. How does it look?

bekozi commented 7 years ago

Added axis labels to the graphs. Is the font size (10) too small? I didn't think the labels were important enough to make them as big as other fonts on the poster.

Small font is fine. There are ways to read it if someone is desperate. Looks like real data now. Thanks! :smile:

twhiteaker commented 7 years ago

FYI, Grid labels in CDL polygon screenshot are Arial 24, RGB (110, 110, 110)

twhiteaker commented 7 years ago

Added simple geometry text. I talk about "features" in there, so we may want to harmonize that with whatever you all decide to use for feature/instance/element.

There's a bullet about what multiparts are for. Not sure if this is important enough to get a bullet, but I think it's something the CF-metadata readers were a little confused about.

twhiteaker commented 7 years ago

Added some fake points to the catchment/river screenshot. Soil moisture is from NLDAS.

twhiteaker commented 7 years ago

Since the graphs look like real data now, I added the data source just under the graph title.

bekozi commented 7 years ago

:+1:

bekozi commented 7 years ago

Added Bull Creek CDL to the poster. It captures multiple geometries with time-varying data variables. It differs slightly from @dblodgett-usgs's CDL which uses instance identifiers. I think it's okay to have the different approaches. We can use these examples when deciding on draft spec. It's open for editing now of course.

@dblodgett-usgs: Were you planning to add text for the CRA v. VLen? I think you moved your CDL graphic around a bit.

P.S. Does anyone know the CF standard names for streamflow, evapotranspiration, and soil moisture?

dblodgett-usgs commented 7 years ago

I think it would be helpful to keep the bull creek example 'conceptual' and give a more basic netcdf3 example on the poster. I've got a list of things to comment about the Bull Creek CDL, but not sure that's worth providing right now since this is a NetCDF-4 VLEN example.

dblodgett-usgs commented 7 years ago

I could draft the text for CRA/VLEN, but think one of you might be better. I'm not familiar with VLEN at all since I'm focused on a CF1.* spec addition.

twhiteaker commented 7 years ago

The Bull Creek CDL has upstream node of river segments instead of the three fake soil moisture stations. I would also remove the GNIS_Name and AreaSqKm variables to simplify things. Ah, and then there's Dave's comment above about using a more basic example.

I suggest:

  1. If you're going to include a Bull Creek CDL, make it just for streamflow for river lines. The idea is to keep the example simple enough that folks can grasp it quickly so they can discuss it with us (Dave) while looking at the poster. The Bull Creek example shows how simple geometry can represent features associated with the data variables, which the simple example green polygon example on the right doesn't show.
  2. Can we add a part to the green polygon on the right, and add a purple polygon, so that we have multiparts and multiple geometries in the example? I think at least adding a second polygon feature would be useful since there seemed to be confusion on the use of the stop index on the CF list.
twhiteaker commented 7 years ago

I think adding a section in the top left briefly summarizing what we're doing would be a nice lead in the story that unfolds naturally from top left to bottom right. Otherwise, the poster doesn't seem to inform the user of what we're doing until bottom middle. This would require some shuffling of the sections around.

dblodgett-usgs commented 7 years ago

@twhiteaker I'd be happy to do multipolygon for the example. I'll make you the shapefile and switch the CDL in a bit. Should we not do hole then?

twhiteaker commented 7 years ago

Either a hole or a multipart is good for demonstrating break values. A second geometry is good for demonstrating the coordinate index stop. Let's start with a hole and a second geometry. If there's room and the poster and time, I might play with adding a second part to the first geometry. I don't think it would make the CDL too complex, but I don't think it's vital either.

dblodgett-usgs commented 7 years ago

OK if we leave the hole off? It doesn't really add anything. Here's what I have so far:

netcdf demoPoly {
dimensions:
    char = 1 ;
    instance = 2 ;
    coordinate_index = 10 ;
    coordinates = 10 ;
variables:
    char instance_name(instance, char) ;
        instance_name:units = "unknown" ;
        instance_name:standard_name = "instance_id" ;
    int coordinate_index(coordinate_index) ;
        coordinate_index:long_name = "ragged index for coordinates and geometry break values" ;
        coordinate_index:geom_coordinates = "x y" ;
        coordinate_index:multipart_break_value = -1 ;
        coordinate_index:start_index = 1 ;
        coordinate_index:hole_break_value = -2 ;
        coordinate_index:outer_ring_order = "anticlockwise" ;
        coordinate_index:closure_convention = "last_node_equals_first" ;
        coordinate_index:geom_type = "multipolygon" ;
    int coordinate_index_stop(instance) ;
        coordinate_index_stop:long_name = "index for last coordinate in each instance geometry" ;
        coordinate_index_stop:contiguous_ragged_dimension = "coordinate_index" ;
    double x(coordinates) ;
        x:units = "degrees_east" ;
        x:standard_name = "geometry x node" ;
    double y(coordinates) ;
        y:units = "degrees_north" ;
        y:standard_name = "geometry y node" ;

// global attributes:
        :Conventions = "CF-1.8" ;
data:

 instance_name =
  "1",
  "2" ;

 coordinate_index = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ;

 coordinate_index_stop = 5, 10 ;

 x = 35, 10, 15, 30, 35, 30, 10, 20, 30, 30 ;

 y = 25, 20, 25, 30, 25, 10, 15, 20, 20, 10 ;
}

screen shot 2016-11-29 at 11 50 18 am

demoShape.zip

dblodgett-usgs commented 7 years ago

I just drafted text for the CRA/VLEN section. PLEASE edit and cut it down... I think it's covering all the bases... maybe too many bases?

bekozi commented 7 years ago

The Bull Creek CDL has upstream node of river segments instead of the three fake soil moisture stations.

@twhiteaker: I'll modify the CDL as this will demonstrate varying geometry lengths.

bekozi commented 7 years ago

If you're going to include a Bull Creek CDL, make it just for streamflow for river lines. The idea is to keep the example simple enough that folks can grasp it quickly so they can discuss it with us (Dave) while looking at the poster. The Bull Creek example shows how simple geometry can represent features associated with the data variables, which the simple example green polygon example on the right doesn't show.

I modified the CDL to link up with the graphic. I removed the identifiers but left the data variables in. I think the example is relatively straightforward now. Let me know if you think it is still cluttered.

I think adding a section in the top left briefly summarizing what we're doing would be a nice lead in the story that unfolds naturally from top left to bottom right. Otherwise, the poster doesn't seem to inform the user of what we're doing until bottom middle. This would require some shuffling of the sections around.

I agree the organization is a little wonky, and we need an introduction. Let's get the content set and then do a reorg.

bekozi commented 7 years ago

I just drafted text for the CRA/VLEN section. PLEASE edit and cut it down... I think it's covering all the bases... maybe too many bases?

Thanks, @dblodgett-usgs. I edited the text down a bit. Good overview.

twhiteaker commented 7 years ago

@bekozi you could simplify the Bull Creek CDL by removing the break value attributes since there aren't any for this case. The colors and white space to separate point/line/polygon are great! Can we drop the _NCProperties global attribute?

@dblodgett-usgs If we're going to mention break values in the text and have those attributes show up in CDL, I think we should illustrate their usage. If we don't illustrate their usage, I think we can leave them out of the text to simplify the story. I do think showing multiple geometries is more important than showing multiparts or holes.

Consider using a color other than pure black for the text highlighting. I guess it depends on your printer, but I've seen some printers lay down too much ink for big black areas so that the paper looks shiny, or it's too wet and crumples, or the black bleeds.

bekozi commented 7 years ago

@bekozi you could simplify the Bull Creek CDL by removing the break value attributes since there aren't any for this case. The colors and white space to separate point/line/polygon are great! Can we drop the _NCProperties global attribute?

Done. _NCProperties is now added by default when using netCDF4-python. Not sure if that's a Python or NetCDF thing. Either way, easy to drop.

twhiteaker commented 7 years ago

Here's the grid for Dave's latest geometries, though including holes or multis is still under discussion. ex_two_poly

twhiteaker commented 7 years ago

@dblodgett-usgs In the contiguous section, I combined the first three bullets into a single bullet because I think they are making a single point which is "Different coordinate counts present a challenge for efficient storage in netCDF." I combined the last bullet and sub-bullet into a single bullet (I think a lone sub-bullet looks...lonely). I replaced continuous with contiguous.

twhiteaker commented 7 years ago

@bekozi Streamflow standard name is water_volume_transport_in_river_channel. The soil moisture data I used was for the 0-100cm layer, so the standard name is moisture_content_of_soil_layer. There is no standard name that I'm aware of for evapotranspiration.

bekozi commented 7 years ago

Thanks. I added standard names and units.

dblodgett-usgs commented 7 years ago

Oh man... I totally forgot that the standard names got in there... I had a hand in that too!

Didn't make the connection that we need multiple geometries AND a multipart. I'll add that now.

The pure black highlighting was for lack of any other ideas. Happy to change.

dblodgett-usgs commented 7 years ago

Better?

netcdf demoPoly {
dimensions:
    char = 1 ;
    instance = 2 ;
    coordinate_index = 16 ;
    coordinates = 15 ;
variables:
    char instance_name(instance, char) ;
        instance_name:units = "unknown" ;
        instance_name:standard_name = "instance_id" ;
    int coordinate_index(coordinate_index) ;
        coordinate_index:long_name = "ragged index for coordinates and geometry break values" ;
        coordinate_index:geom_coordinates = "x y" ;
        coordinate_index:multipart_break_value = -1 ;
        coordinate_index:start_index = 1 ;
        coordinate_index:hole_break_value = -2 ;
        coordinate_index:outer_ring_order = "anticlockwise" ;
        coordinate_index:closure_convention = "last_node_equals_first" ;
        coordinate_index:geom_type = "multipolygon" ;
    int coordinate_index_stop(instance) ;
        coordinate_index_stop:long_name = "index for last coordinate in each instance geometry" ;
        coordinate_index_stop:contiguous_ragged_dimension = "coordinate_index" ;
    double x(coordinates) ;
        x:units = "degrees_east" ;
        x:standard_name = "geometry x node" ;
    double y(coordinates) ;
        y:units = "degrees_north" ;
        y:standard_name = "geometry y node" ;

// global attributes:
        :Conventions = "CF-1.8" ;
data:

 instance_name =
  "1",
  "2" ;

 coordinate_index = 1, 2, 3, 4, 5, -2, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ;

 coordinate_index_stop = 11, 16 ;

 x = 35, 26, 25, 30, 35, 22, 22, 15, 10, 22, 30, 10, 20, 30, 30 ;

 y = 25, 23, 28, 30, 25, 22, 27, 25, 20, 22, 10, 15, 20, 20, 10 ;
}

screen shot 2016-11-30 at 9 05 39 pm

demoShape.zip

bekozi commented 7 years ago

Looking good. Shouldn't the spaces in the node standard names be replaced with underscores?

dblodgett-usgs commented 7 years ago

Ahhh yeah. I've got some updates to do in the R reference implementation that I haven't gotten to yet.

twhiteaker commented 7 years ago

I'd delete instance_name:units attribute.

twhiteaker commented 7 years ago

From our VLEN in NetCDF 3 wiki page, there's this quote about the stop index:

the stop index (1 past the last index) for each VLEN chunk.

This is different than Dave's example, which puts the stop value at the last index instead of 1 past the last index, given the coordinate_index:start_index value of 1. This was clearly intentional given this attribute: coordinate_index_stop:long_name = "index for last coordinate in each instance geometry"

The CRA examples follow Dave's convention (one based, stop on last index), whereas the readme seems to be zero based and stopping one past the last index.

I don't care which way we do it, but we had better be clear and consistent about it.

...Well, maybe I do care. Stopping one past the last index is more Python friendly, but I like stopping at the last index for human readability which I think is more in line with CF.

twhiteaker commented 7 years ago

Grid for two poly one multi. ex_two_poly_multi

twhiteaker commented 7 years ago

@dblodgett-usgs change -2 to -1 in your data.

twhiteaker commented 7 years ago

Jeez I'm lazy. Ok, here's what I'm suggesting. Also, I took out the hole break value since no holes. And I fixed the coordinates so that they were all anticlockwise...by hand, so hopefully it's right.

dimensions:
    char = 1 ;
    instance = 2 ;
    coordinate_index = 16 ;
    coordinates = 15 ;
variables:
    char instance_name(instance, char) ;
        instance_name:standard_name = "instance_id" ;
    int coordinate_index(coordinate_index) ;
        coordinate_index:long_name = "ragged index for coordinates and geometry break values" ;
        coordinate_index:geom_coordinates = "x y" ;
        coordinate_index:multipart_break_value = -1 ;
        coordinate_index:start_index = 1 ;
        coordinate_index:outer_ring_order = "anticlockwise" ;
        coordinate_index:closure_convention = "last_node_equals_first" ;
        coordinate_index:geom_type = "multipolygon" ;
    int coordinate_index_stop(instance) ;
        coordinate_index_stop:long_name = "index for last coordinate in each instance geometry" ;
        coordinate_index_stop:contiguous_ragged_dimension = "coordinate_index" ;
    double x(coordinates) ;
        x:units = "degrees_east" ;
        x:standard_name = "geometry_x_node" ;
    double y(coordinates) ;
        y:units = "degrees_north" ;
        y:standard_name = "geometry_y_node" ;

// global attributes:
        :Conventions = "CF-1.8" ;
data:

 instance_name =
  "1",
  "2" ;

 coordinate_index = 1, 2, 3, 4, 5, -1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ;

 coordinate_index_stop = 11, 16 ;

 x = 35, 30, 25, 26, 35, 22, 22, 15, 10, 22, 30, 30, 20, 10, 30 ;

 y = 25, 30, 28, 23, 25, 22, 27, 25, 20, 22, 10, 20, 20, 15, 10 ;
}
twhiteaker commented 7 years ago

After playing with several colors for highlighting, I didn't find any of them harmonious with the rest of the poster. Color might also be confusing since we use color to associate sections of CDL with features in the map in the other CDL example. In the end I just lightened the black a bit.

dblodgett-usgs commented 7 years ago

Opps... @twhiteaker - My code is fine, the WKT I created that gets read in is encoded wrong. The second polygon is encoded as a hole that is outside the first polygon!