terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
24 stars 13 forks source link

Add ability to filter sites by experiment in BETYdb #349

Closed max-zilla closed 7 years ago

max-zilla commented 7 years ago

Currently I need to get sites limited by experiment. in python my pseudocode is kind of ugly:

get_sites_by_experiment(date)
1. query experiments table with associations_mode='full_info'
2. for each experiment, does date fall within experiment date range?
3. if yes, loop over every site_id in 'experiments_sites' and query sites table to get full site info
4. generate list of site bounding boxes and sitenames

It would be nice if something like experiment_id or experiment is supported when querying sites table, but my attempts didn't seem to work.

Is this currently possible? If not, my code above works for now but could be done more efficiently on psql side.

dlebauer commented 7 years ago

To clarify, given a date and a city, you want to find all of the plots associated with a given experiment?

It doesn't solve the immediate problem but note that you can also query sites with full info. https://terraref.ncsa.illinois.edu/bety/api/beta/sites?city=Maricopa&key=9999999999999999999999999999999999999999&limit=none&associations_mode=full_info&limit=10

max-zilla commented 7 years ago

@dlebauer that could actually work in fewer steps, didn't realize sites supported associations also. This might be enough

gsrohde commented 7 years ago

@max-zilla "full_info" mode actually gives full site information when querying on experiments. For example, https://terraref.ncsa.illinois.edu/bety/api/beta/experiments?name=MAC+Season+1:+BAP&associations_mode=full_info will show not only the 384 experiments_sites rows associated with site "MAC Season 1: BAP", but also the 384 associated sites. In fact, in most cases, I turn off display of the join table information (experiments_sites in this case) because it doesn't provide any useful information that isn't already there without it, but I forgot to do that for experiments_sites.

So clearly you can get sites limited by experiment, but I sense that this isn't exactly what you want. For one thing, if you want to find all sites associated with some select group of experiments (rather than just a single experiment), they won't be returned as a single group but as a set of groups, one group for each experiment, possibly with repetitions. And it also seems that you want to be able to filter by date range, which isn't currently possible except in certain special cases. (For example, by using a fuzzy-match parameter (regular expression), you could get all experiments started in December of 2016 with https://terraref.ncsa.illinois.edu/bety/api/beta/experiments?start_date=~2016-12.)

By the way, if you know the id number of an experiment, you can find the associated site information with (for example) https://terraref.ncsa.illinois.edu/bety/api/beta/experiments/6000000000. "full_info" is the default mode for this type of single-entity query, where the id number is part of the path rather than the query string. (https://terraref.ncsa.illinois.edu/bety/api/beta/experiments?id=6000000000&associations_mode=full_info gives basically the same information in a slightly different form.)

Also, if you just want the site ids (as in your pseudocode outline), it would suffice to use associations_mode=ids. And it would be faster.

gsrohde commented 7 years ago

P.S.: If you use https://terraref.ncsa.illinois.edu/bety/api/beta/experiments.xml?name=MAC+Season+1:+BAP&associations_mode=full_info, you could get the results as an XML document instead of the (default) JSON, and there are quite sophisticated and succinct methods of extracting information from an XML document using XPATH or XML style sheets. I've had quite a bit of experience working with XML in Python if you wanted to go this route.

Also, for visualizing the result document (either JSON or XML), it's useful to try these queries in a browser. Chrome, at least, displays results nicely formatted and with interactive handles to collapse or expand portions of the document.

max-zilla commented 7 years ago

@gsrohde perfect! my python IDE was truncating the JSON result and I missed that sites object - looks like you've already given me this functionality so I'll close this issue.