opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Changes to 'Require' option in order to remove classic-associations #3395

Open prashantuniyal02 opened 1 month ago

prashantuniyal02 commented 1 month ago

Describe the bug Removing the classic-associations, causes issues with the 'Require' option in the associations on the fly. This is because it uses aspects of the classic-associations filters data types. image

Observed behaviour The dev crashes on removing the classic association from the BE: #3361

Expected behaviour We need a way to enable 'Require' option in the AOTF:

  1. This can be done by linking facets with the 'Require' option. But currently facets behaviour is 'AND' where the behaviour on the 'Require' option is 'OR'.
  2. Use / create another endpoint with just the data types.
jdhayhurst commented 1 month ago

A couple of options to explore:

  1. Extending the datasources parameter to have a boolean that captures the "required" checkbox state and applies it
  2. Enabling operator specification in the facet filters and then adding "dataTypes" to the facets backend:
    # f1 AND f2 AND f3
    "facetFilters": {["f1", "f2", "f3"], "AND"}
    # f4 OR f5 OR f6
    "facetFilters": {["f4", "f5", "f6"], "OR"}
    # (f1 AND f2 AND f3) AND (f4 OR f5 OR f6)
    "facetFilters": {[{["f1", "f2", "f3"], "AND"}, {["f4", "f5", "f6"], "OR"}], "AND"} 
buniello commented 3 weeks ago

Discussed on 19/8/24: require functionality should be and 'AND'

prashantuniyal02 commented 3 weeks ago

Updated discussion on 22/08 - require functionality will be an 'OR' for the 24.09 release. This is because using an 'AND' filtering will make the selection of all the columns be clicking the bar (Genetic association, Somatic mutations) give no results most of the time. Will work on improve the UX at a later stage.

jdhayhurst commented 2 weeks ago

This is mostly a behind-the-scenes change, where the filtering is done via clickhouse instead of opensearch (allowing us to remove a big index and a lot of code). The API change is in the query variables, where the DatasourceSettings contains a new boolean field called required. If you want a given datasource to be returned you must set this field to true. As discussed previously, if all are false, all datasources will be returned. Here is an example api query to only return genomics_england and eva_somatic datasources:

query TargetAssociationsQuery(
  $id: String!
  $index: Int!
  $size: Int!
  $filter: String
  $sortBy: String!
  $enableIndirect: Boolean!
  $datasources: [DatasourceSettingsInput!]
  $rowsFilter: [String!]
  $facetFilters: [String!]
) {
  target(ensemblId: $id) {
    id
    approvedSymbol
    associatedDiseases(
      page: { index: $index, size: $size }
      orderByScore: $sortBy
      BFilter: $filter
      enableIndirect: $enableIndirect
      datasources: $datasources
      Bs: $rowsFilter
      facetFilters: $facetFilters
    ) {
      count
      rows {
        disease {
          id
          name
        }
        score
        datasourceScores {
          componentId: id
          score
        }
      }
    }
  }
}

query variables:

{
  "id": "ENSG00000157764",
  "index": 0,
  "size": 50,
  "filter": "",
  "sortBy": "score",
  "enableIndirect": false,
  "datasources": : [
    {
      "id": "ot_genetics_portal",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "gene_burden",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "eva",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "genomics_england",
      "weight": 1,
      "propagate": true,
      "required": true
    },
    {
      "id": "gene2phenotype",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "uniprot_literature",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "uniprot_variants",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "orphanet",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "clingen",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "cancer_gene_census",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "intogen",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "eva_somatic",
      "weight": 1,
      "propagate": true,
      "required": true
    },
    {
      "id": "cancer_biomarkers",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "chembl",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "crispr_screen",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "crispr",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "slapenrich",
      "weight": 0.5,
      "propagate": true
    },
    {
      "id": "progeny",
      "weight": 0.5,
      "propagate": true
    },
    {
      "id": "reactome",
      "weight": 1,
      "propagate": true
    },
    {
      "id": "sysbio",
      "weight": 0.5,
      "propagate": true
    },
    {
      "id": "europepmc",
      "weight": 0.2,
      "propagate": true
    },
    {
      "id": "expression_atlas",
      "weight": 0.2,
      "propagate": true
    },
    {
      "id": "impc",
      "weight": 0.2,
      "propagate": true
    },
    {
      "id": "ot_crispr_validation",
      "weight": 0.5,
      "propagate": true
    },
    {
      "id": "ot_crispr",
      "weight": 0.5,
      "propagate": true
    },
    {
      "id": "encore",
      "weight": 0.5,
      "propagate": true
    }
  ],
  "entity": "target",
  "aggregationFilters": []
}
prashantuniyal02 commented 2 weeks ago

Tested in dev. The API response matches the 'Require' in platform prod.

prashantuniyal02 commented 2 weeks ago

Found a problem with the require:

When required is true, we still want to use the data for the other columns (for which required is false) to calculate the association score.

When required is true, the backend should only drop rows that don't have data in the required column. It should not drop columns for the rows for which require is not true.