opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Removing the hardcoded facets search category values #3325

Open prashantuniyal02 opened 1 month ago

prashantuniyal02 commented 1 month ago

Currently the filter values for the facets search category have been hardcoded into the ETL. Ideally in the future, we want to remove these hardcoded values and update the ETL.

Background

The category values are:

On the target facets:

category count
GO:CC 1823
GO:MF 4673
Subcellular Location 817
Tractability Antibody 9
GO:BP 12232
Approved Name 47927
Approved Symbol 61619
ChEMBL Target Class 720
Reactome 2140
Target ID 63226
Tractability Small Molecule 8
Tractability PROTAC 7
Tractability Other Modalities 3

On the disease facets:

category
Disease
Therapeutic Area

3239 and #3268

Tasks

buniello commented 4 weeks ago

Hi @mbdebian, is there any update on this issue? (as per yesterday’s discussion)

mbdebian commented 4 weeks ago

Hi @buniello , I have checked the ETL and there is a step implemented for the Facet Search that consumes data from the very ETL target, disease and go outputs, according to the code and the configuration file

facet-search {
  inputs = {
    diseases = ${disease.outputs.diseases}
    targets {
      path = ${common.output}"/targets"
      format = ${common.output-format}
    }
    go {
      format = ${common.output-format}
      path = ${common.output}"/go"
    }
  }
  outputs {
    targets {
      format = ${common.output-format}
      path = ${common.output}"/facetSearchTarget"
    }
    diseases {
      format = ${common.output-format}
      path = ${common.output}"/facetSearchDisease"
    }
  }
}

This means that, as far as I know, it doesn't require additional data collection from PIS.

POS is up to date as well (see the indexes configuration file), depositing this new ETL output in two different Opensearch indexes: facetSearchTarget and facetSearchDisease.

Regarding the GraphQL API, code has been developed to work with those new indexes from Opensearch.

I would say backend seems to have things from ETL up to GraphQL schema level.

jdhayhurst commented 3 weeks ago

@prashantuniyal02, to follow up on this, I've had some chats with @mbdebian and we've identified got a bit more clarity on this issue.

I think the main issue here is not so much that we have hardcoded values (although this should be addressed) but that they are hardcoded in two different places, the ETL and the front end. This has the potential to become unmanageable so I'd propose the following actions:

  1. Externalise the category values from the ETL code by moving them to a mapping in the ETL config
  2. Expose the unique category values via an API (do we need the counts?)
  3. Replace the hardcoded values in the front end with the values retrieved from the API in step 2.