terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
21 stars 13 forks source link

Confirm that plot boundaries for sorghum crop 2 generated from field scanner coordinates are consistent with those based on RTK GPS #187

Closed dlebauer closed 7 years ago

dlebauer commented 7 years ago

Update: they don't match, so will make new polygons. https://github.com/terraref/computing-pipeline/issues/187#issuecomment-256760060 below:

For reference, we have two sets of polygons

  1. those made by @tingli3 based on gantry measurements (transformed from gantry xy) these are already in BETYdb and can be queried as json here: https://terraref.ncsa.illinois.edu/bety/api/beta/sites?sitename=~Season+2&limit=none&key=
  2. those made by @anfrench, used by @rickw-ward and exported to json by @Mamatemenrs .

(current hypothesis) The reason these differ is that (1) were two row plots and (2) were the same plots after they were a) combined into four row plots, and b) subset to just the inner two rows. (2) should be used for summarizing plot level metrics since they exclude border rows.


FYI (@robkooper and @gsrohde) this is where we will need to begin combining sites using the geometry field (related to PecanProject/bety#444) ... so if someone queries for plot 1 they can find all contained or intersecting points and polygons. Developing these queries can wait until we have data and more specific use cases.


We should confirm that these are in alignment

anfrench commented 7 years ago

how do i read this xml file above?

dlebauer commented 7 years ago

... it's json but that is all I know, I think the geometries are in WKT format. If you prefer postgis you can import this dump of the database https://terraref.ncsa.illinois.edu/bety/dump/bety.tar.gz and look at the sites table, but it should be very similar (for the record the database dump includes metadata but excludes all of the trait data until we make it public).

anfrench commented 7 years ago

Ok , looks like I have some reading to do, WKT isn’t something I know anything about, I tried the ArcGIS JSON conversion tool and it failed. Maybe QGIS will handle more easily. Since the issue was raised I created one-row polygons. The attached will need to be checked and edited, it has one polygon per one-row plot and includes buffers between plots (but not buffers on the outside).

dlebauer commented 7 years ago

@anfrench I think Wasit or Matt can do this, as it will be a good opportunity for them to give me feedback on the data formats and how they can interoperate.

Mamatemenrs commented 7 years ago

We can work with the shapefile that Rick has been using after it is confirmed. I'll further check with Wasit though.

Thanks.

dlebauer commented 7 years ago

@Mamatemenrs have you had a chance to compare the data in betydb with the shapefile? It looks like we will need to insert new bounding boxes in the database that correspond to the interior two row subset.

Mamatemenrs commented 7 years ago

Anyone can confirm what is the coordinate system and datum on the JSON file? I see geographic long/lats, but wondering the datum? Currently, it does not align with the file Rick shared.

dlebauer commented 7 years ago

@Mamatemenrs I think it is WGS84. Please confirm with @tingli3

dlebauer commented 7 years ago

@mamatemenrs @kylepeterson777 you show a map overlaying the two?

can one of you or @tingli3 convert the shapefile to a postgis insert statement similar to #174?

Mamatemenrs commented 7 years ago

Tried to open the JSON file in QGIS and it only opened part of the shape file, but still able to overlay them somehow. Basically, they don't align.

gantry_shp

dlebauer commented 7 years ago

@Mamatemenrs to start, can you output a version of Ricks shape file that looks like the json file? (Only sitename and geometry fields are necessary)

@smarshall-bmr and @tingli3 can you please try to resolve these differences?

anfrench commented 7 years ago

The two sorghum trials have different plot layouts, they will not align. I think there are 8 'ranges' in first one and 7 in the second. What we are going for is season 2 as noted earlier by @dlebauer above. If I knew how betdb works I could see about converting the shape file to json format in scanner coordinates; i created the shp files in UTM zone 12 using WGS84 via proj4 library in R.

Mamatemenrs commented 7 years ago

Here is JSON format of Rick's file. projection is UTM zone 12 using WGS84.

Rick_JSON.zip

rickw-ward commented 7 years ago

Shape files of current (10-26-16) sorghum layout at MAC are at: https://drive.google.com/open?id=0B5btztQSCEFxVUZ6SlRHVDBpdWM .

Original file of foundation xyz from USDA RTK is in this zip: foundation shape file.zip

orthomosaic as geotiff from four weeks ago (NDVI) from UAV flight is at: https://drive.google.com/open?id=0B5btztQSCEFxYjZDTzJwWkctVWM

When I put all of the above in a QGIS project, all features align. Note the plot polygons would not align with the first sorghum crop of this year (April to July).

dlebauer commented 7 years ago

@rickw-ward et al

rickw-ward commented 7 years ago

Is there significance to the fact that the season one plots are geo-located in only the south 1/5 of those for the current season?

dlebauer commented 7 years ago

@rickw-ward that is just the first 200 records returned by default since I forgot to add &limit=none to the API call.

rickw-ward commented 7 years ago

It appears to be shifted south two ranges, which should not be the case. But perhaps an artifact.

dlebauer commented 7 years ago

@anfrench do you have the R code and original data that you used to generate these polygons?

anfrench commented 7 years ago

Sorghum2_DataFile_AugustPlanting.csv.txt makeshape_F13B2_Fall2016.r.txt

tingli3 commented 7 years ago

The polygons I generated are based on the plot plan and field book, which has 54 ranges and 16 columns (32 rows) in total. The coordinates of gantry field are converted into latitude and longitude (in WGS84) with coordinates summary.

What are the data source of these two files provided: Sorghum2_DataFile_AugustPlanting.csv.txt and the makeshape_F13B2_Fall2016.r.txt? Is this the ground truth?

dlebauer commented 7 years ago

@tingli3 I think you can ignore those files since they were used to generate a shape file that @Mamatemenrs has exported as json. The next step is to create the BETYdb insert statements for these polygons - they will be listed as new 'sites'. But instead of Plot 123 the sitename field should have Row x Col y (where x and y are integers of the row and column fields).

tingli3 commented 7 years ago

@dlebauer Can you give me some example of the insert statements I should generate? Or what does the table I will be working on look like?

What do you mean by "they will be listed as new 'sites'. But instead of Plot 123 the sitename field should have Row x Col y (where x and y are integers of the row and column fields)" ?

rickw-ward commented 7 years ago

Please retain the data for range and pass and note just rowxcol.

tingli3 commented 7 years ago

Also, the shape file is projected in UTM 12 N. Should the output coordinates XY be converted to WGS84 or still kept in UTM?

dlebauer commented 7 years ago

@tingli3 same table (BETYdb sites) and insert statements as #174. Row and col values Come from the shape file and should be used in the plot name. Like sitename = Field Scanner Season 2 Row 1 Col 1

@rick-ward please specify a standard plot naming scheme if you prefer.

dlebauer commented 7 years ago

@tingli3 same projection as #174 as well

rickw-ward commented 7 years ago

@anfrench @Andrade-Pedro see questions above about UTM vs WGS84 above.

rickw-ward commented 7 years ago

For the RIL experiment, the experimental units have addressses composed of two variables: range and pass. Those, plus the entry number, replication, blocks within reps, etc. can be retrieved through a shared, unique plot#. Range-Pass enables instant recognition of where an experimental unit is, and is the grid system used for planning envelope filling, planting, and plot labeling. Not sure this is helping much though...

NewcombMaria commented 7 years ago

I wonder if it would be helpful in this plot-boundary issue/discussion to define some terminology, or come up with standard definitions in the context of a data or metadata dictionary. A 'Range' extends in the Y direction (East-West) and in sorghum crop1 and sorghum crop2 there have been 54 ranges. A 'Row' extends in the X direction (North-South) and in sorghum crop1 and sorghum crop2 there have been 32 rows. With sorghum crop1 it was useful to use the term 'Column' to represent 2-rows, since crop1 had 2-row plots and also 2-row border plantings on the West and East sides (16 'columns' was a useful concept). In the design for sorghum crop2, the term 'Pass' has been used. Sorghum crop2 has 2-row border plantings on the West and East sides, and 4-row plots (7 experimental plots per range) and it's more useful to refer to 32 rows rather than 16 columns for the sorghum crop2. I hope this might be helpful, to standardize terms and come up with common definitions.

dlebauer commented 7 years ago

Lets move discussion of the plot definitions here: terraref/reference-data#60

anfrench commented 7 years ago

@tingli3 at the risk of saying things you already know: WG84 is a reference ellipsoid. UTM zone 12 is a projection. Projections need a corresponding ellipsoid. In the NAD83 system that ellipsoid is GRS80, not WGS84, but they are very nearly the same. When you create a latlon to UTM transformation in proj4 (PROJCS) you specify the semi-major axis and flattening. GRS80 is 6378137m; 1/298.257222100882711... WGS84 at one time was the same as GRS80, but is now: 6378137 m (same); flattening a bit different 298.257223563 This is all to say that the shape Ive provided for the sorghum trial #2 is a projection to UTM zone 12 using the WGS84 ellipsoid. I think we should continue using WGS84 ellipsoid.

remotesensinglab commented 7 years ago

Can tingli3 or dlebauer confirm that JSON file Matt used for comparison was actually created using WGS84 ellipsoid, UTM Projection zone 12N?

Mamatemenrs commented 7 years ago

Season 2 shapefiles

Projected Coordinate System for these files: WGS_1984_UTM_Zone_12N

season2 plys

dlebauer commented 7 years ago

@remotesensinglab the plots from BETYdb json file were generated thus: https://github.com/terraref/computing-pipeline/blob/master/scripts/geospatial/field_scanner_plots.R; @tingli3 can comment on technical details.

It looks like Ricks shp file plots are a subset of Tings json plots, and close enough that there represent the same concept of a plot but different enough that this represents measurement error. For aligning UAV and gantry data I suggest going with the plot boundaries derived from gantry measurements.

So Matt can get started I'd suggest extracting means for the superset of plots. We can group Ricks plots as appropriate for analysis and drop the others if they are contain mixed genotypes.

Rick - am I correct that the ones that do not overlap your plots have two border rows of different genotypes?

rickw-ward commented 7 years ago

Something is amiss. The array of Polygons from andys ('ricks') shape file should be aligned east west at the same center point as the yellow polygons. But they aren't. The yellow is shifted significantly east Why? Can someone send. E the two shape files and I'll check them in qgis.

rickw-ward commented 7 years ago

@anfrench and I just discussed the frameshift of yellow vs red polygons. This included a reveiw of the R script here https://github.com/terraref/computing-pipeline/blob/master/scripts/geospatial/field_scanner_plots.R . Query for @smarshall-bmr and @dlebauer, what point in the field is zero, zero for the gantry? in attached screen shot of QGIS with layers of orthomosaic geotiff of field/gantry, polygons of middle two rows of plots, and foundation centers (visiblly aligned with geotiff), the text box is pointing at a red dot positioned 2m east of Y coordinate of what I think is the Lemntatec zero,zero point. If so, and that is used as a corner of plots, it would explain the eastward shift of the yellow polygons.

screen shot 2016-10-31 at 11 17 43 am
Mamatemenrs commented 7 years ago

Shapefiles for the second map on this page.

Rick_David_shps.zip

rickw-ward commented 7 years ago

thanks @Mamatemenrs . I pulled your json file into my QGIS project, and also added a shape file from @rjstrand at Lemnatec with a single feature that is Lemntatec's (ie. the gantry's) zero,zero point (red dot above feature attributes box). The east edge of your east-most plot (green polygons) is closely aligned with the Y axis position of the Gantry zero,zero. But that point is outside of the actual planted area. If your polygons are based on that as the east most point of the Y axis, it would explain why they are shifted one row east of ours (pinkish polygons over two middle rows of four row plots; note two rows of buffer to east and to west of 7 4-row plots in a range.

screen shot 2016-10-31 at 2 17 37 pm
rickw-ward commented 7 years ago

@ZongyangLi and @rmgarnett . See the thread above. My perspective is strictly 2D at this stage.

rickw-ward commented 7 years ago

Any updates @dlebauer or @Mamatemenrs ?

Mamatemenrs commented 7 years ago

I do not have anything to update regarding these shapefiles.

tingli3 commented 7 years ago

new_sql.txt This is the sql generated for the shapefile. Not sure about whether the shift from UA-MAC to USDA should be applied. Currently, the coordinates are just transfromed from UTM12.

dlebauer commented 7 years ago

@tingli3 from the figure in https://github.com/terraref/computing-pipeline/issues/187#issuecomment-257424469 it looks like your R code from #174 may have two issues:

  1. the eastern side of the plot boundaries is aligned to y = 0, whereas the minimum value of y measured by @smarshall-bmr on the gantry was 0.632 (see spreadsheet).
  2. The plots are adjacent to one another. However the same measurements indicate that there is a gap of 20 cm between each plot on the x (NS) axis.

@smarshall-bmr Your measurements indicate a 20 cm gap but I believe that @NewcombMaria said these were intended to be ~50 cm, and this is closer to @rickw-ward's map and observations.

tingli3 commented 7 years ago

@dlebauer For the first issue, I have asked question about the start of y more than once and I got no answer. So I have to assume y start at 0 and I wrote this assumption when I uploaded the sql for the first time. There are only 32 records of y and I need 33 records to get 32 rows so I have to assume y start at 0.

For the second issue, if you look at gantry_shp, there is a gap though the file is incomplete. I am not sure why the gap disappears in the pictures later.

smarshall-bmr commented 7 years ago

For what it's worth I have updated the range centers document to include row boarders. All I did was place midpoints between each row and use the average row width to place a start and end point.

dlebauer commented 7 years ago

@ZongyangLi when you used the plots as defined by Stewart, did you realize that his measurements were the plot centers rather than the plot boundaries?

ZongyangLi commented 7 years ago

@dlebauer To my understanding, 54 ranges in N/S direction is the centers and 16 rows in W/E direction is the boundaries, is that right?

dlebauer commented 7 years ago

@zongyangli yes that sounds correct

rickw-ward commented 7 years ago

does focusing on 16 units e/w preclude consideration of the middle two rows of a single, four-row plot since those two middle rows will be in separate units? Does it not make sense to make a boundary for each row separately instead of two rows units?