Closed dlebauer closed 7 years ago
how do i read this xml file above?
... it's json but that is all I know, I think the geometries are in WKT format. If you prefer postgis you can import this dump of the database https://terraref.ncsa.illinois.edu/bety/dump/bety.tar.gz and look at the sites table, but it should be very similar (for the record the database dump includes metadata but excludes all of the trait data until we make it public).
Ok , looks like I have some reading to do, WKT isn’t something I know anything about, I tried the ArcGIS JSON conversion tool and it failed. Maybe QGIS will handle more easily. Since the issue was raised I created one-row polygons. The attached will need to be checked and edited, it has one polygon per one-row plot and includes buffers between plots (but not buffers on the outside).
@anfrench I think Wasit or Matt can do this, as it will be a good opportunity for them to give me feedback on the data formats and how they can interoperate.
We can work with the shapefile that Rick has been using after it is confirmed. I'll further check with Wasit though.
Thanks.
@Mamatemenrs have you had a chance to compare the data in betydb with the shapefile? It looks like we will need to insert new bounding boxes in the database that correspond to the interior two row subset.
Anyone can confirm what is the coordinate system and datum on the JSON file? I see geographic long/lats, but wondering the datum? Currently, it does not align with the file Rick shared.
@Mamatemenrs I think it is WGS84. Please confirm with @tingli3
@mamatemenrs @kylepeterson777 you show a map overlaying the two?
can one of you or @tingli3 convert the shapefile to a postgis insert statement similar to #174?
Tried to open the JSON file in QGIS and it only opened part of the shape file, but still able to overlay them somehow. Basically, they don't align.
@Mamatemenrs to start, can you output a version of Ricks shape file that looks like the json file? (Only sitename and geometry fields are necessary)
@smarshall-bmr and @tingli3 can you please try to resolve these differences?
The two sorghum trials have different plot layouts, they will not align. I think there are 8 'ranges' in first one and 7 in the second. What we are going for is season 2 as noted earlier by @dlebauer above. If I knew how betdb works I could see about converting the shape file to json format in scanner coordinates; i created the shp files in UTM zone 12 using WGS84 via proj4 library in R.
Here is JSON format of Rick's file. projection is UTM zone 12 using WGS84.
Shape files of current (10-26-16) sorghum layout at MAC are at: https://drive.google.com/open?id=0B5btztQSCEFxVUZ6SlRHVDBpdWM .
Original file of foundation xyz from USDA RTK is in this zip: foundation shape file.zip
orthomosaic as geotiff from four weeks ago (NDVI) from UAV flight is at: https://drive.google.com/open?id=0B5btztQSCEFxYjZDTzJwWkctVWM
When I put all of the above in a QGIS project, all features align. Note the plot polygons would not align with the first sorghum crop of this year (April to July).
@rickw-ward et al
Is there significance to the fact that the season one plots are geo-located in only the south 1/5 of those for the current season?
@rickw-ward that is just the first 200 records returned by default since I forgot to add &limit=none
to the API call.
It appears to be shifted south two ranges, which should not be the case. But perhaps an artifact.
@anfrench do you have the R code and original data that you used to generate these polygons?
The polygons I generated are based on the plot plan and field book, which has 54 ranges and 16 columns (32 rows) in total. The coordinates of gantry field are converted into latitude and longitude (in WGS84) with coordinates summary.
What are the data source of these two files provided: Sorghum2_DataFile_AugustPlanting.csv.txt and the makeshape_F13B2_Fall2016.r.txt? Is this the ground truth?
@tingli3 I think you can ignore those files since they were used to generate a shape file that @Mamatemenrs has exported as json. The next step is to create the BETYdb insert statements for these polygons - they will be listed as new 'sites'. But instead of Plot 123
the sitename field should have Row x Col y
(where x and y are integers of the row and column fields).
@dlebauer Can you give me some example of the insert statements I should generate? Or what does the table I will be working on look like?
What do you mean by "they will be listed as new 'sites'. But instead of Plot 123 the sitename field should have Row x Col y (where x and y are integers of the row and column fields)" ?
Please retain the data for range and pass and note just rowxcol.
Also, the shape file is projected in UTM 12 N. Should the output coordinates XY be converted to WGS84 or still kept in UTM?
@tingli3 same table (BETYdb sites) and insert statements as #174. Row and col values Come from the shape file and should be used in the plot name. Like sitename = Field Scanner Season 2 Row 1 Col 1
@rick-ward please specify a standard plot naming scheme if you prefer.
@tingli3 same projection as #174 as well
@anfrench @Andrade-Pedro see questions above about UTM vs WGS84 above.
For the RIL experiment, the experimental units have addressses composed of two variables: range and pass. Those, plus the entry number, replication, blocks within reps, etc. can be retrieved through a shared, unique plot#. Range-Pass enables instant recognition of where an experimental unit is, and is the grid system used for planning envelope filling, planting, and plot labeling. Not sure this is helping much though...
I wonder if it would be helpful in this plot-boundary issue/discussion to define some terminology, or come up with standard definitions in the context of a data or metadata dictionary. A 'Range' extends in the Y direction (East-West) and in sorghum crop1 and sorghum crop2 there have been 54 ranges. A 'Row' extends in the X direction (North-South) and in sorghum crop1 and sorghum crop2 there have been 32 rows. With sorghum crop1 it was useful to use the term 'Column' to represent 2-rows, since crop1 had 2-row plots and also 2-row border plantings on the West and East sides (16 'columns' was a useful concept). In the design for sorghum crop2, the term 'Pass' has been used. Sorghum crop2 has 2-row border plantings on the West and East sides, and 4-row plots (7 experimental plots per range) and it's more useful to refer to 32 rows rather than 16 columns for the sorghum crop2. I hope this might be helpful, to standardize terms and come up with common definitions.
Lets move discussion of the plot definitions here: terraref/reference-data#60
@tingli3 at the risk of saying things you already know: WG84 is a reference ellipsoid. UTM zone 12 is a projection. Projections need a corresponding ellipsoid. In the NAD83 system that ellipsoid is GRS80, not WGS84, but they are very nearly the same. When you create a latlon to UTM transformation in proj4 (PROJCS) you specify the semi-major axis and flattening. GRS80 is 6378137m; 1/298.257222100882711... WGS84 at one time was the same as GRS80, but is now: 6378137 m (same); flattening a bit different 298.257223563 This is all to say that the shape Ive provided for the sorghum trial #2 is a projection to UTM zone 12 using the WGS84 ellipsoid. I think we should continue using WGS84 ellipsoid.
Can tingli3 or dlebauer confirm that JSON file Matt used for comparison was actually created using WGS84 ellipsoid, UTM Projection zone 12N?
Season 2 shapefiles
Projected Coordinate System for these files: WGS_1984_UTM_Zone_12N
@remotesensinglab the plots from BETYdb json file were generated thus: https://github.com/terraref/computing-pipeline/blob/master/scripts/geospatial/field_scanner_plots.R; @tingli3 can comment on technical details.
It looks like Ricks shp file plots are a subset of Tings json plots, and close enough that there represent the same concept of a plot but different enough that this represents measurement error. For aligning UAV and gantry data I suggest going with the plot boundaries derived from gantry measurements.
So Matt can get started I'd suggest extracting means for the superset of plots. We can group Ricks plots as appropriate for analysis and drop the others if they are contain mixed genotypes.
Rick - am I correct that the ones that do not overlap your plots have two border rows of different genotypes?
Something is amiss. The array of Polygons from andys ('ricks') shape file should be aligned east west at the same center point as the yellow polygons. But they aren't. The yellow is shifted significantly east Why? Can someone send. E the two shape files and I'll check them in qgis.
@anfrench and I just discussed the frameshift of yellow vs red polygons. This included a reveiw of the R script here https://github.com/terraref/computing-pipeline/blob/master/scripts/geospatial/field_scanner_plots.R . Query for @smarshall-bmr and @dlebauer, what point in the field is zero, zero for the gantry? in attached screen shot of QGIS with layers of orthomosaic geotiff of field/gantry, polygons of middle two rows of plots, and foundation centers (visiblly aligned with geotiff), the text box is pointing at a red dot positioned 2m east of Y coordinate of what I think is the Lemntatec zero,zero point. If so, and that is used as a corner of plots, it would explain the eastward shift of the yellow polygons.
Shapefiles for the second map on this page.
thanks @Mamatemenrs . I pulled your json file into my QGIS project, and also added a shape file from @rjstrand at Lemnatec with a single feature that is Lemntatec's (ie. the gantry's) zero,zero point (red dot above feature attributes box). The east edge of your east-most plot (green polygons) is closely aligned with the Y axis position of the Gantry zero,zero. But that point is outside of the actual planted area. If your polygons are based on that as the east most point of the Y axis, it would explain why they are shifted one row east of ours (pinkish polygons over two middle rows of four row plots; note two rows of buffer to east and to west of 7 4-row plots in a range.
@ZongyangLi and @rmgarnett . See the thread above. My perspective is strictly 2D at this stage.
Any updates @dlebauer or @Mamatemenrs ?
I do not have anything to update regarding these shapefiles.
new_sql.txt This is the sql generated for the shapefile. Not sure about whether the shift from UA-MAC to USDA should be applied. Currently, the coordinates are just transfromed from UTM12.
@tingli3 from the figure in https://github.com/terraref/computing-pipeline/issues/187#issuecomment-257424469 it looks like your R code from #174 may have two issues:
@smarshall-bmr Your measurements indicate a 20 cm gap but I believe that @NewcombMaria said these were intended to be ~50 cm, and this is closer to @rickw-ward's map and observations.
@dlebauer For the first issue, I have asked question about the start of y more than once and I got no answer. So I have to assume y start at 0 and I wrote this assumption when I uploaded the sql for the first time. There are only 32 records of y and I need 33 records to get 32 rows so I have to assume y start at 0.
For the second issue, if you look at , there is a gap though the file is incomplete. I am not sure why the gap disappears in the pictures later.
For what it's worth I have updated the range centers document to include row boarders. All I did was place midpoints between each row and use the average row width to place a start and end point.
@ZongyangLi when you used the plots as defined by Stewart, did you realize that his measurements were the plot centers rather than the plot boundaries?
@dlebauer To my understanding, 54 ranges in N/S direction is the centers and 16 rows in W/E direction is the boundaries, is that right?
@zongyangli yes that sounds correct
does focusing on 16 units e/w preclude consideration of the middle two rows of a single, four-row plot since those two middle rows will be in separate units? Does it not make sense to make a boundary for each row separately instead of two rows units?
Update: they don't match, so will make new polygons. https://github.com/terraref/computing-pipeline/issues/187#issuecomment-256760060 below:
For reference, we have two sets of polygons
(current hypothesis) The reason these differ is that (1) were two row plots and (2) were the same plots after they were a) combined into four row plots, and b) subset to just the inner two rows. (2) should be used for summarizing plot level metrics since they exclude border rows.
Plot 123
the sitename field should haveRow x Col y
(where x and y are integers of the row and column fields).FYI (@robkooper and @gsrohde) this is where we will need to begin combining sites using the geometry field (related to PecanProject/bety#444) ... so if someone queries for plot 1 they can find all contained or intersecting points and polygons. Developing these queries can wait until we have data and more specific use cases.
We should confirm that these are in alignment