Open AlanSimmons opened 1 year ago
search-api DSL to parameter abstraction.xlsx
In Google Drive: https://docs.google.com/spreadsheets/d/1EpQVREOr33-5mh3LJ8-ZFmmkJquDni9zShQB0r3vqr8/edit?usp=sharing
The attached document contains information for attributes for consortium-level HubMAP ElasticSearch indexes:
The ElasticSearch index attributes correspond to values in the return from the search-api, flattened by dot notation. The response JSON nests up to 4 levels. For example, the attribute ancestors.metadata.cold_ischemia_time_unit corresponds to a path in the response similar to:
{
...
{"ancestors":[
{"metadata": [
"cold_ischema_time_unit:"
If a document is associated with an member of an Entity Provenance Hierarchy (e.g., donor>sample>dataset), the index will include information that helps to locate the document in the hierarchy.
The Entity Provenance ontology organizes information in ways that include:
The Entity Provenance elements relate with ancestor and descendant relationships.
Elements can contain other elements of the same entity type hierarchically, to represent division or derivation--e.g.,
The introduction tab of the spreadsheet describes the contents of the rest of the document.
The spreadsheet identifies 8 use cases that could be satisfied with new endpoints in the search-api.
@AlanSimmons We need to define, if possible, a way to do paging as there are limits to how much data we can return at one time (both from Elasticsearch and RESTful responses)
Request
Provide a set of parameterized endpoints that simplify queries of HubMAP/SenNet data. These endpoints would provide a layer of abstraction for "convenience searches" that would allow users to query data without having to construct ElasticSearch DSL queries.
Background
As the README for search-api states, search-api is a "thin wrapper of ElasticSearch." The current search (and _search_byindex) endpoints allow the execution of queries against HuBMAP ElasticSearch indexes.
Queries beyond simple searches on things like assay names require specifying search parameters using ElasticSearch DSL--e.g.,
We think that many consumers of the search-api might find the requirement to describe searches with DSL onerous. We also think that consumers will expect "more RESTful" endpoints that allow lists of search parameters
Solution
Develop endpoints in format https://search.api.hubmapconsortium.org/ search entity?list of index attributes
For example, an endpoint that returned all CODEX datasets for heart samples might look like:
https://search.api.hubmapconsortium.org/dataset?organ=HT&data_type=codex
https://search.api.hubmapconsortium.org/dataset?organ=HT&data_type=codex&returned_attributes=group_name%2Cdonor%2Chubmap_id%2Cuuid%2Cimmediate_ancestors
The default return (if returned_attributes is not specified) would include the entire document.
https://search.api.hubmapconsortium.org/datasets?organ=HT&data_type=codex&&returned_attribute_collection=collection1
Known, high-level tasks
Notes