terraref / data-paper

0 stars 1 forks source link

Create SQLite databases for data pub #20

Closed dlebauer closed 4 years ago

dlebauer commented 4 years ago

if it would help you avoid moving data around, you can run your scripts against the filesystem at workbench.terraref.org

TODO

Chris-Schnaufer commented 4 years ago

@dlebauer Some questions and comments:

dlebauer commented 4 years ago

Seasons

Season Planting Date Harvest
4 2017-04-13 2017-09-21
6 2018-04-20 2018-08-02

These can be found at https://terraref.ncsa.illinois.edu/bety/api/v1/experiments?name=~Season+6 and https://terraref.ncsa.illinois.edu/bety/api/v1/experiments?name=~Season+4

re: splitting up the databases. I would split as little as possible.

Chris-Schnaufer commented 4 years ago

Indexing is very important for performance. The 'unified' table is a view and therefore prevents data duplication and keeps the size small. Making it a materialized view or as a single table would increase the size of the DB due to duplication.

dlebauer commented 4 years ago

@Chris-Schnaufer That makes sense - no need to worry about creating a materialized view - that is a pretty low priority.

Chris-Schnaufer commented 4 years ago

@dlebauer I'm not finding any masked plot level images to work withScreen Shot 2020-04-30 at 11.54.55 AM.png

dlebauer commented 4 years ago

you are correct that there are no plot level masks :-( I've updated the requirements

Chris-Schnaufer commented 4 years ago

I am not able to run this using Globus. It will always timeout before the run is complete. I'm waiting for the problems with workbench to be resolved via Rob Kooper or someone he sends me to

Chris-Schnaufer commented 4 years ago

Finishing this issue and writing a new ticket for remaining work that's dependent upon terraref workbench access

Chris-Schnaufer commented 4 years ago

https://app.zenhub.com/workspaces/ua-ag-data-science-5a57a3198339f11ba1c85775/issues/agpipeline/issues-and-projects/146

dlebauer commented 4 years ago

reopening - keeping as a placeholder for github.com/agpipeline/issues-and-projects/issues/146

dlebauer commented 4 years ago

@Chris-Schnaufer could you please check the code you used into https://github.com/terraref/data-publication/tree/master/code? (it doesn't have to be pretty, but hopefully commented so it could be used and adapted in the future)

Chris-Schnaufer commented 4 years ago

@dlebauer done: https://github.com/terraref/data-publication/tree/sqlite_jupyter_scripts

dlebauer commented 4 years ago

Remaining tasks are in https://github.com/az-digitalag/organization/issues/254