x-atlas-consortia / hs-ontology-api

Ontology API built on top of the UBKG API and shared by HuBMAP and SenNet projects
0 stars 0 forks source link

Enhancement: filter datasets and dataset types by active/inactive status #93

Open AlanSimmons opened 3 months ago

AlanSimmons commented 3 months ago

@shirey @HerrLufty @maxsibilla

Statement of problem

The UBKG contains information on dataset types or datasets (assay types) for which ingestion workflows are still in development. Data from the UBKG is used to do things like populate drop-down lists in the UI.

We need to limit to the development environment attempts to work with dataset types and assay types that are in development.

Requirements

  1. It should be possible to indicate that either a dataset (assay type) (e.g., sciATACseq) or a dataset type (a category for assay types--e.g, ATACSeq) is either "active" (available for display or membership in dropdown lists) or "inactive".
  2. It should be possible to control the activity state (active/inactive) for a given assay type or dataset type by environment. For example, for purposes of testing ingestion of a new assay type X, we may want X to be active in DEV but inactive in PROD.
  3. Filtering should be for both dataset type and assay type. For example, we may want to work with a new assay type Y that has a dataset type that associates with active assay types.

Solution

UBKG neo4j

  1. Create a new set of property nodes that indicate activity status.
  2. Link activity status nodes to both assay type and dataset type nodes.

    hs-ontology api

  3. Enhance the datasets, assayname, and assaytype endpoints so that they can filter by activity status.
  4. Create a new endpoint named dataset_type that returns a list of dataset types, filtered by activity status. The new endpoint will replace the use of the generic valueset endpoint.
  5. Set the default for all endpoints to be to display active dataset types and assay types.
AlanSimmons commented 3 months ago

UBKG enhancements

I created property nodes for "Active" and "Inactive", and linked the node to nodes that correspond to assay classifications (originally "datasets") and dataset types.

Examples from the graph showing:

  1. The Auto-fluorescence assay classification is active.
image
  1. The Xenium dataset type is inactive: image
AlanSimmons commented 3 months ago

hs-ontology-api endpoints

datasets, assayname, assaytype

The /datasets, assayname, and assaytype endpoints have been enhanced to account for the active/inactive status of assay classifications.

  1. Endpoints feature two new filtering parameters:
    • dataset_active: filter on assay classifications
    • dataset_type_active: filter on dataset types Both parameters take either 'active' or 'inactive' as values.

By default, both dataset_active and _dataset_type_active* are set to 'active'.

  1. A new _/datasettypes endpoint returns information on dataset types. This endpoint is intended to replace the call to valueset to return dataset types. The _/datasettypes endpoint has a dataset_active parameter that is 'active' by default.
AlanSimmons commented 3 months ago

Current settings of active/inactive for assay classifications and dataset types

HUBMAP.json SENNET.json

Note that:

  1. Some assay classifications are not yet mapped to dataset types.
  2. Some assay classifications or dataset types are not mapped to active/inactive states.
AlanSimmons commented 3 months ago

Blocking issue

It appears that "active" and "inactive" are specific to ingestion, not display.

Until we resolve all of the use cases, the changes to the API will be put on hold.

AlanSimmons commented 3 months ago

Current status

I will PR the changes to the endpoint, with "active" as the default. I will use an enum of ["active","inactive","all"] for the parameter to catch unassigned nodes (for which my OPTIONAL MATCH queries return null).

AlanSimmons commented 1 month ago

Moved to new endpoint

The datasets endpoint will be deprectated and replaced with the endpoints described in #102 . This feature will be part of that endpoint.

AlanSimmons commented 1 month ago

The new hs-ontology-api endpoint assayclasses returns the active or inactive property of each assay classification.