terraref / reference-data

Coordination of Data Products and Standards for TERRA reference data
https://terraref.org
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Support for ArcGIS #81

Closed dlebauer closed 7 years ago

dlebauer commented 7 years ago

Many users (including key PIs, funders, and industry partners) have asked about using standard ArcGIS files. Much of the geospatial community either use ArcGIS or ArcGIS file formats.

So far we have had requests for

The users' story is: "I use ArcGIS or some other software that uses the ArcGIS shapefile format. How can upload shapefiles to your database or download your data as shapefiles?

Objectives

Completion Criteria

ArcGIS interface

ArcGIS data products

remotesensinglab commented 7 years ago

I suggest two methods for this. Method 1 - Census data approach: Create a shape file for each growing season with a polygon representing a plot/genotype. A column in the shaprefile stores unique Plot ID (plot names). All other plot level statistics of NDVI, temperature, etc., for entire growing cycle are stored in multiple .CSV files. The columns of a .CSV file (e.g., 2016_Sean1_NDVI.CSV) stores NDVI values for different dates or time. Rows represent plots/genotypes with a column dedicated for Plot ID. Essentially, there will be one .CSV file for each variable, e.g., 2016_Sean1_NDVI.CSV, 2016_Sean1_FlirLST.CSV, 2016_Sean1_PRI.CSV, etc. One needs to download both the shapefile and a the .CSV file, then join them in ArcGIS for their analysis.

Method 2: One shape file with all of the LST, NDVI, PRI data values stored in different columns for that season.

In either method, column heads can be named by Seasonn, variable, year, and date (or day of year) and variable. E.g., S1NDVI160315, represents season 1, NDVI data of March 15, 2016.

My preference is the Method 1, but can work with either one.

dlebauer commented 7 years ago

@remotesensinglab both of the formats you describe combine a lot of information in the column headers that would have to be parsed to be useful. For example, a time series analysis would require converting S1NDVI160315 to a sensor name and date. There are data structures that would serve the same purpose but put bits of information in their place, e.g. dates as a vector of date-time stamps (for the record, early on we decided all dates would be written in ISO 8601 standard YYYY-MM-DD HH:MM:SS.SSS (time is optional)

What I think would give the most flexibility is if you could connect ArcGIS directly to the databases using the PostgreSQL connection. I assume that from the ArcGIS interface, our 'sites' table will look something like a shape file, and our 'traits' table will look like your csv files. The primary difference is that the data is in 'long' format. Transposing this to 'wide' format you propose would be straightforward. However, instead of having dates as column headers they could come as a vector. Can you and your group use ArcGIS to generate the data structures that work best for you? @jterstriep can help with the connection and basic conversions as well as the raster files and @gsrohde and I can help you get what you need out of the BETYdb and @max-zilla can help with the Clowder databases.

I think that this would provide you with maximum flexibility to explore all of the data in the system rather than the subset that we could export as a shapefile. As far as creating data products, once you have developed the formats that would be useful and are in a standard format, either our team can help to automate their development or you can build pipelines in ArcGIS.

rickw-ward commented 7 years ago

@dlebauer is this capacity meant to span the realm of phenotyping devices from manual, tractor, drone, and gantry?

dlebauer commented 7 years ago

@rick-ward yes, the primary idea here is to allow users within the team to access data from within their favorite GIS tool, conduct analyses, and design and publish data products. ArcGIS is the specific use case, but similar products like QGIS should work equally well.

These data products (like shape files) would be available to the public and can be hosted on arcgisonline or similar.

ghost commented 7 years ago

@gsrohde and @jterstriep - please work with Wasit ( @remotesensinglab ) as discussed so he can test this for analysis so he can give feedback for use by the larger community.

jterstriep commented 7 years ago

@dlebauer and I had a teleconference on Monday with @remotesensinglab . We're starting with ArcGIS access to BetyDB. Further discussion with @robkooper indicated some security concerns with exposing the PostGIS port to the internet. The current plan is to allow ssh access to the host and setup an ssh tunnel to reach the Postgres port. A read-only account will be created for Postgres access.

jterstriep commented 7 years ago

This is related to issue https://github.com/terraref/computing-pipeline/issues/262.

jterstriep commented 7 years ago

I hit what may be a significant problem with direct ArcMap access to BETYdb. In the bety.public.sites table, the 'id' field is of type bigserial which is not recognized by ArcMap. This means that 'id' can not be selected as the primary key and most other fields contain nulls so they can't be used either. I was able to make some progress using the 'sitename' field. Is that guaranteed to be unique?

Of course, the lack of the primary key applies to the other tables as well so I suspect that it may be difficult to load them at all. I couldn't find a suitable field in the yield table for instance.

I'll continue researching the issue but I wanted to get this out there to see if anyone else had dealt with the problem.

dlebauer commented 7 years ago

Did you mean to close this?

Each table has a 'natural key' defined by a uniqueness constraint on one or more fields. In addition it may be possible to change the type for id or make views that do the same (you could start with the traits_and_yields_view and see how far you get joining sites on site name). Or contact ESRI... On Thu, Mar 9, 2017 at 8:52 PM Jeff Terstriep notifications@github.com wrote:

Closed #81 https://github.com/terraref/reference-data/issues/81.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/reference-data/issues/81#event-994262610, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX5zlCZmejBuUQfb0RcB-pNzGZn_ZOks5rkLrOgaJpZM4LwbKG .

jterstriep commented 7 years ago

Sorry, I guess I hit the wrong button.

I'll investigate the idea of 'natural key' some more but ArcMap seems very particular about the primary field(s). Other users have reported the error to ESRI but they seem unresponsive on this issue going back years.

dlebauer commented 7 years ago

I'd start by trying to join traitsanyieldsview to sites on site name

dlebauer commented 7 years ago

not clear if bigserial is required https://github.com/PecanProject/bety/issues/218 @robkooper is there another data type we could use for id in BETYdb?

robkooper commented 7 years ago

it should be a bigint, which is required since that allows us to have enough numbers to do syncs

dlebauer commented 7 years ago

Jeff would bigint work?

https://www.postgresql.org/docs/9.5/static/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL

jterstriep commented 7 years ago

Looks like it won't. My understanding is that bigserial is simply bigint with the auto-increment machinery. The sites table id shows as bigint and is not selectable when ArcMap tries to build a query table.

jterstriep commented 7 years ago

I don't see any views (well, beyond the system and postgis ones). Are there non-public tables/views that the 'viewer' user can't see?

gsrohde commented 7 years ago

@jterstriep They are now materialized views, so I think they are stored in a different place from normal views.

Regarding sitename as key and uniqueness guarantees: It currently is unique in terraref's bety database—this query returns no results:

select sitename, count(*) from sites group by sitename having count(*) > 1;

But there is not (yet) any constraint guaranteeing this. (The same query on the EBI database returns 89 rows.)

This has been a long-standing issue (see https://github.com/PecanProject/bety/issues/201). We have implemented some constraints on the terraref database that we haven't added to the codebase generally, and we could possibly add this one if it would facilitate the work you are trying to do.

dlebauer commented 7 years ago
select * from traits_and_yields_view limit 10;
select * from traits_and_yields_view join sites on traits_and_yields_view.sitename = sites.sitename where site_id = 6000000866;
dlebauer commented 7 years ago

@gsrohde yes, please add uniqueness constraints on sites on the ref and mepp databases! (also methods and variables and others - but if it needs discussion please make a new issue).

jterstriep commented 7 years ago

I'll have to experiment some more but I'm wondering if we can solve this by creating some ArcMap-friendly views. @gsrohde, do you have a development database where I can make changes or should I install my own postgres system and load the betydb? I assume there is dump I can get somewhere.

dlebauer commented 7 years ago

The dump is here: https://terraref.ncsa.illinois.edu/bety/dump/bety.tar.gz

dlebauer commented 7 years ago

apologies. That no longer looks like an SQL dump ... but you should be able to import the db with this load.bety.sh script though I forget how to use it (@robkooper would know ...)

gsrohde commented 7 years ago

I added the “unique sitenames” constraint to the terra-ref database. I hope to add it to the terra-mepp database when I hear back from David Slater.

Unless there are other constraints that are urgently needed, I thought I’d hold off on them and make them part of a more comprehensive migration.

gsrohde commented 7 years ago

@jterstriep FYI I've added a uniqueness constraint for sites.sitename.

dlebauer commented 7 years ago

@gsrohde what is the best way for @jterstriep to get a dev version of bety? Copy it within bety6 to a bety-dev database? Or use the load.bety.sh script as described in the PEcAn installation docs?

gsrohde commented 7 years ago

Depends. By "bety", I assume you mean one of the BETYdb databases. Does he need it populated with data or just the schema? If the former, does he need the bety6 bety data and if so, should it be all data or just the public data (with users anonymized)?

Making a clone on bety6 is probably the easiest option if you aren't concerned about further taxing computational resources on that machine. load.bety.sh probably isn't much more difficult, but my knowledge of that script is mostly confined to the rather narrow context of getting data from ebi-forecast to populate a development version of the BETYdb Rails app.

jterstriep commented 7 years ago

I'm looking at using load.bety.sh and auto-running it as part of a Docker container. This would allow users easily create the BetyDB on any machine including their laptop. Part of this would fix up data type issues that are causing ArcMap to choke. ArcMap would point to the local database rather than the production databases.

robkooper commented 7 years ago

I think it already does: https://github.com/PecanProject/bety/blob/master/docker/entrypoint.sh#L28

However i think it would be better to have a new option called sync that would sync the database, and bety would just start it.

jterstriep commented 7 years ago

I've tried a couple of options with ArcMap and basically, it's not going to support big integers. It looks like the ArcGIS Pro does support big integers but I don't think that's a near-term solution.

If BetyDB needs big ints for its tables' primary keys, could we make a set of materialized views with a cast from big integer to integer? This might be limiting for some users but I think it would make life much easier for most users. Is there a limited set of tables/views to which users need access?

dlebauer commented 7 years ago

Jeff, have you tried joining traitsview to sites on site name?

robkooper commented 7 years ago

Maybe make a single view that joins all the different tables needed, but drops the ID's? Problem is that the ID's are used as foreign keys to join the different tables together.

What is the exact data they are looking for?

dlebauer commented 7 years ago

Basically they need the traitsview with the sites.geometry and the site name should work as a foreign key. On Tue, Mar 21, 2017 at 9:45 AM Rob Kooper notifications@github.com wrote:

Maybe make a single view that joins all the different tables needed, but drops the ID's? Problem is that the ID's are used as foreign keys to join the different tables together.

What is the exact data they are looking for?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/reference-data/issues/81#issuecomment-288082329, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX53GW0iNriYpaY4kFO4D9IdIRbk0Gks5rn9SIgaJpZM4LwbKG .

robkooper commented 7 years ago

The other option is to make a traitsviewsites view that has all the information in the single view.

dlebauer commented 7 years ago

I think keeping traitsview and sites separate is more consistent with the shapefile schema, if that helps. On Tue, Mar 21, 2017 at 10:06 AM Rob Kooper notifications@github.com wrote:

The other option is to make a traitsviewsites view that has all the information in the single view.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/reference-data/issues/81#issuecomment-288088553, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX56ei1KGhKMjKUP5IlVoUABHEXJEqks5rn9llgaJpZM4LwbKG .

jterstriep commented 7 years ago

Jeff, have you tried joining traitsview to sites on site name?

I'll double check this. But I think this is possible but you have to do odd things to get the layers (tables) loaded in the first place and ignore warnings in loading the traits and sites tables to do the join. A better solution is to create views with the functional data type. The question is which tables need this view and, if we are creating views, should we do the joins as well.

Actually, the first question is why is big int being used at all? Is any table in betydb likely to need more than 4 billion records?

gsrohde commented 7 years ago

@dlebauer Re https://github.com/terraref/reference-data/issues/81#issuecomment-286555526:

I've added the "unique sitenames" constraint to terra-mepp database now.

dlebauer commented 7 years ago

@gsrohde please add constraints to terra-ref / bety6 too if not already done

@jterstriep notice that ids in bety6 stay from 6 billion. This is to enable syncing across instances of BETYdb. https://pecan.gitbooks.io/betydb-documentation/content/distributed_betydb.html

dlebauer commented 7 years ago

@jterstriep if you are planning to create custom views, can you do so in a way that is OGC compliant? for example, following the GeoPackage schema? http://www.geopackage.org/

gsrohde commented 7 years ago

@dlebauer Yes, already done (bety6 constraints)—see https://github.com/terraref/reference-data/issues/81#issuecomment-286555526. I didn't bother with bety7 but could if needed.

jterstriep commented 7 years ago

This is a more general problem of doing joins across multiple tables. It's not specific to any table.

Options include the following:

ghost commented 7 years ago

@jterstriep - should I set up a meeting to move this forward?

dlebauer commented 7 years ago

Jeff what is the status with this? On Tue, Apr 4, 2017 at 12:43 PM Rachel Shekar notifications@github.com wrote:

@jterstriep https://github.com/jterstriep - should I set up a meeting to move this forward?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/reference-data/issues/81#issuecomment-291577262, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX51ePUPbBEDZSTcLpYjRnQ01Lyq9Gks5rsoEzgaJpZM4LwbKG .

jterstriep commented 7 years ago

Here is a draft write-up that explains the steps required to access betydb directly from ArcMap. I think it has more detail that is needed but I can add pics, etc. Please review and if it looks OK, I'll put it into the formal docs.

https://docs.google.com/document/d/1gnaD5f1YljO7mjBEHPuYNNG1pod43ud8nAl5ON2E0Io/edit?usp=sharing

dlebauer commented 7 years ago

Closed by https://github.com/terraref/documentation/pull/149