terraref / reference-data

Coordination of Data Products and Standards for TERRA reference data
https://terraref.org
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Provide support for file finding queries #279

Closed dlebauer closed 4 years ago

dlebauer commented 4 years ago

ML group would like to be able to query a list of files by

In the end, this requires a table that can easily be queried to return a list of files associated with multiple parameters

This is an example of a json file that is currently used by the group to describe an associated file see also: https://docs.google.com/document/d/1iW7GhrzGOV0_ZRZOIUk-hbh7MsenO9I4kMbmqWvjN2c/edit

Proposed tables

files

experimental info

cultivars

weather (can be joined to files as needed given file start/end time and gantry x, y)

Views

  1. Support query by cultivar, plot, season, sensor, date/daterange
  2. comparative queries by weather (e.g. wind < 5m/s)
  3. query by cultivars subset by presence or absence of specific genes

Minimal viable product

First two tables: files and experiment info and a view that joins these two tables and can be queried

Questions

Chris-Schnaufer commented 4 years ago

From the cultivar_files view query select * from cultivar_files limit 5;:

plot_id plot_name season file_id folder filename format sensor start_time finish_time gantry_x gantry_y gantry_z cultivar_name
6000013647 MAC Field Scanner Season 6 Range 10 Column 1 E MAC Season 6: Sorghum BAP 1 /ua-mac/raw_data/stereoTop/2018-05-08/2018-05-08__10-45-41-991 ea29e6ba-6868-42ce-a0cb-be81e3125735_left.bin bin stereoTop 05/08/2018 10:45:41 05/08/2018 10:45:41 144.3005 4.991 0.87 SP1516
6000013647 MAC Field Scanner Season 6 Range 10 Column 1 E MAC Season 6: Sorghum BAP 2 /ua-mac/raw_data/stereoTop/2018-05-08/2018-05-08__10-45-41-991 ea29e6ba-6868-42ce-a0cb-be81e3125735_metadata.json json stereoTop 05/08/2018 10:45:41 05/08/2018 10:45:41 144.3005 4.991 0.87 SP1516
6000013647 MAC Field Scanner Season 6 Range 10 Column 1 E MAC Season 6: Sorghum BAP 3 /ua-mac/raw_data/stereoTop/2018-05-08/2018-05-08__10-45-41-991 ea29e6ba-6868-42ce-a0cb-be81e3125735_right.bin bin stereoTop 05/08/2018 10:45:41 05/08/2018 10:45:41 144.3005 4.991 0.87 SP1516
6000013647 MAC Field Scanner Season 6 Range 10 Column 1 E MAC Season 6: Sorghum BAP 4 /ua-mac/raw_data/stereoTop/2018-05-08/2018-05-08__10-33-42-852 ca73a152-b9c9-48be-b717-56c75b2ab1c7_left.bin bin stereoTop 05/08/2018 10:33:42 05/08/2018 10:33:42 153.3 21.509 0.869 SP1516
6000013647 MAC Field Scanner Season 6 Range 10 Column 1 E MAC Season 6: Sorghum BAP 5 /ua-mac/raw_data/stereoTop/2018-05-08/2018-05-08__10-33-42-852 ca73a152-b9c9-48be-b717-56c75b2ab1c7_metadata.json json stereoTop 05/08/2018 10:33:42 05/08/2018 10:33:42 153.3 21.509 0.869 SP1516

From the weather_files view query select * from weather_files where file_id is not null limit 5;:

timestamp temperature illuminance precipitation sun_direction wind_speed wind_direction relative_humidity file_id folder filename format sensor start_time finish_time gantry_x gantry_y gantry_z
2018.05.08-09:19:30 28.6727500229 200.0 0.0956064333 99.8907437361 1.9263283181 187.9158909879 17.3711355937 7240 /ua-mac/raw_data/stereoTop/2018-05-08/2018-05-08__09-19-34-515 2171e7e3-b42a-472b-81b8-a67312f37f2f_metadata.json json stereoTop 05/08/2018 09:19:34 05/08/2018 09:19:34 207.299 0.001 0.87
2018.05.08-09:19:30 28.6727500229 200.0 0.0956064333 99.8907437361 1.9263283181 187.9158909879 17.3711355937 7241 /ua-mac/raw_data/stereoTop/2018-05-08/2018-05-08__09-19-34-515 2171e7e3-b42a-472b-81b8-a67312f37f2f_right.bin bin stereoTop 05/08/2018 09:19:34 05/08/2018 09:19:34 207.299 0.001 0.87
2018.05.08-09:19:30 28.6727500229 200.0 0.0956064333 99.8907437361 1.9263283181 187.9158909879 17.3711355937 7242 /ua-mac/raw_data/stereoTop/2018-05-08/2018-05-08__09-19-34-515 2171e7e3-b42a-472b-81b8-a67312f37f2f_left.bin bin stereoTop 05/08/2018 09:19:34 05/08/2018 09:19:34 207.299 0.001 0.87
2018.05.08-09:19:35 28.6819055757 200.0 0.0956064333 100.0006103702 2.5012970367 200.2868739891 17.2795800653 7240 /ua-mac/raw_data/stereoTop/2018-05-08/2018-05-08__09-19-34-515 2171e7e3-b42a-472b-81b8-a67312f37f2f_metadata.json json stereoTop 05/08/2018 09:19:34 05/08/2018 09:19:34 207.299 0.001 0.87
2018.05.08-09:19:35 28.6819055757 200.0 0.0956064333 100.0006103702 2.5012970367 200.2868739891 17.2795800653 7241 /ua-mac/raw_data/stereoTop/2018-05-08/2018-05-08__09-19-34-515 2171e7e3-b42a-472b-81b8-a67312f37f2f_right.bin bin stereoTop 05/08/2018 09:19:34 05/08/2018 09:19:34 207.299 0.001 0.87
gsrohde commented 4 years ago

ML group would like to be able to query a list of files by

  • season (required unless date is provided)
  • date

When querying by date, keep in mind that data entered through the v1 API stores date-times at UTC time based on the local time at which the data was collected, the site at which it was collected, and the time zone stored for the site at which it was collected. For dateloc values other than 5 and timeloc values other than 1, there are certain conventions for normalizing the representation of the "approximate" date and time. This isn't much of an issue for the TERRA-REF database since all currently-stored traits have a dateloc value of 5 and a timeloc value of 1 (precise to the second) or 9 (no time-of-day data). Keep in mind that the date_year, date_month, date_day, time_hour, and time_minute columns are obsolescent and shouldn't be used. The v1 API doesn't use them but unfortunately the Web interface still does. So you won't see any date/time information on the Web page for any trait added via the v1 API. (There is precisely one trait row in the TERRA-REF database with non-null date_year, date_month, and date_day: see https://terraref.ncsa.illinois.edu/bety/traits/6001892300.)

This is one of those issues that never bubbled to the top of my priority queue and so never got fully resolved. It's a somewhat messy issue and was especially messy with the EBI database in which the date-time data stored was a muddled mess of different conventions. I can point you to the relevant issues which lay out the precise conventions for storing data/time data if you are interested.

dlebauer commented 4 years ago

@gsrohde thanks for this reminder! I think in this case we are not dealing with the traits table - only associating metadata with files. So hopefully we don't run into an error here!

@Chris-Schnaufer

Chris-Schnaufer commented 4 years ago

Desired changes:

gsrohde commented 4 years ago

@gsrohde thanks for this reminder! I think in this case we are not dealing with the traits table - only associating metadata with files. So hopefully we don't run into an error here!

Still, for any table that uses any of the date or time or timestamp types, you need to decide how the raw database value should be interpreted and make sure all software that sets or reads a column having one of these types conforms to whatever convention you have adopted. And you should document that convention somewhere. You also, of course, need to know what convention any input data used.

Chris-Schnaufer commented 4 years ago

The sample SQLite DB: https://drive.google.com/file/d/1VyPtQYcj5xaaB7OUB6UILNd9It3tfgeD/view?usp=sharing

The unified table in the SQLite database has all the fields making it easy to query for the files desired

file_id folder filename format sensor start_time finish_time gantry_x gantry_y gantry_z plot_id plot_name season plot_bb_min_lat plot_bb_min_lon plot_bb_max_lat plot_bb_max_lon cultivar_name Sobic_006G057866_1_40312463 Sobic_006G147400_1_50898459 Sobic_006G147400_1_50898536 Sobic_006G067700_1_42805319 Sobic_006G147400_1_50898315 Sobic_006G067700_1_42804037 Sobic_001G269200_1_51588525 Sobic_001G269200_1_51588838 Sobic_001G269200_1_51589143 Sobic_006G147400_1_50898231 Sobic_009G229800_1_57040680 Sobic_001G269200_1_51589435 Sobic_006G147400_1_50898523 Sobic_006G147400_1_50898525 Sobic_006G004400_2_2697734 weather_timestamp temperature illuminance precipitation sun_direction wind_speed wind_direction relative_humidity
1 /ua-mac/Level_1_Plots/rgb_geotiff/2018-05-08/MAC Field Scanner Season 6 Range 10 Column 1 rgb_geotiff_L1_ua-mac_2018-05-08__13-10-45-826_left.tif tif RGB 05/08/2018 13:10:45 05/08/2018 13:10:45 37.3 21.013 0.869 6000014550 MAC Field Scanner Season 6 Range 10 Column 1 MAC Season 6: Sorghum BAP 33.0748552864667 -111.975055859446 33.0748868595077 -111.975039475766 SP1516 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2018.05.08-13:10:44 36.7357402264 200.0 0.1017100742 217.7117221595 3.5322122868 198.2433545946 8.966338084
2 /ua-mac/Level_1_Plots/rgb_geotiff/2018-05-08/MAC Field Scanner Season 6 Range 10 Column 1 rgb_geotiff_L1_ua-mac_2018-05-08__13-10-45-826_right.tif tif RGB 05/08/2018 13:10:45 05/08/2018 13:10:45 37.3 21.013 0.869 6000014550 MAC Field Scanner Season 6 Range 10 Column 1 MAC Season 6: Sorghum BAP 33.0748552864667 -111.975055859446 33.0748868595077 -111.975039475766 SP1516 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2018.05.08-13:10:44 36.7357402264 200.0 0.1017100742 217.7117221595 3.5322122868 198.2433545946 8.966338084
3 /ua-mac/Level_1_Plots/rgb_geotiff/2018-05-08/MAC Field Scanner Season 6 Range 10 Column 1 rgb_geotiff_L1_ua-mac_2018-05-08__13-10-47-436_left.tif tif RGB 05/08/2018 13:10:45 05/08/2018 13:10:45 37.3 21.013 0.869 6000014550 MAC Field Scanner Season 6 Range 10 Column 1 MAC Season 6: Sorghum BAP 33.0748552864667 -111.975055859446 33.0748868595077 -111.975039475766 SP1516 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2018.05.08-13:10:44 36.7357402264 200.0 0.1017100742 217.7117221595 3.5322122868 198.2433545946 8.966338084
4 /ua-mac/Level_1_Plots/rgb_geotiff/2018-05-08/MAC Field Scanner Season 6 Range 10 Column 1 rgb_geotiff_L1_ua-mac_2018-05-08__13-10-47-436_right.tif tif RGB 05/08/2018 13:10:45 05/08/2018 13:10:45 37.3 21.013 0.869 6000014550 MAC Field Scanner Season 6 Range 10 Column 1 MAC Season 6: Sorghum BAP 33.0748552864667 -111.975055859446 33.0748868595077 -111.975039475766 SP1516 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2018.05.08-13:10:44 36.7357402264 200.0 0.1017100742 217.7117221595 3.5322122868 198.2433545946 8.966338084
5 /ua-mac/Level_1_Plots/rgb_geotiff/2018-05-08/MAC Field Scanner Season 6 Range 10 Column 1 rgb_geotiff_L1_ua-mac_2018-05-08__13-10-49-116_left.tif tif RGB 05/08/2018 13:10:45 05/08/2018 13:10:45 37.3 21.013 0.869 6000014550 MAC Field Scanner Season 6 Range 10 Column 1 MAC Season 6: Sorghum BAP 33.0748552864667 -111.975055859446 33.0748868595077 -111.975039475766 SP1516 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2018.05.08-13:10:44 36.7357402264 200.0 0.1017100742 217.7117221595 3.5322122868 198.2433545946 8.966338084
dlebauer commented 4 years ago

implemented in https://github.com/AgPipeline/issues-and-projects/issues/146