Open fmichonneau opened 6 years ago
We've talked about this as a "unique values" API endpoint, ie "Show me the unique values of this field and their counts", adding a query to filter the records as you describe above like "phylum == X and country ==Y" would be a good refinement.
The difficulty is that Elastic Search is great at top-style queries that don't rely on collecting 100% of results and terrible at distinct and count type things. We're evaluating how to provide this in a performant manner. @godfoder
If you have an immediate research need, these are really easy to do in Spark and we can talk about how to get numbers you need off our cluster:
https://github.com/bio-guoda/guoda-examples/blob/master/iDigBio%20Country%20Checklist.ipynb
(Rendering that seems busted at the moment but it's typical filter, grouby, count stuff.)
It would be nice to have a summary endpoint (similar to
summary/top/records/
andsummary/count/records/
) that would return the number of species (e.g. distinctscientificname
) for a given query. That would allow to answer questions such as "how many species of phylum X are in country Y?"