opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Variant search - backend #3365

Open jdhayhurst opened 4 months ago

jdhayhurst commented 4 months ago

We need a enable the search of variants in the OpenTargets platform search.

Background

It should be possible to search on

Data should come from the variant output from gentropy: https://github.com/opentargets/issues/issues/3350

what happens when you search?

should we return top genes - yes this can be pre-computed and weighted from transcriptConsequences.distance and transcriptConsequences.targetId.

Tasks

Acceptance tests

How do we know the task is complete?

  1. When I run the ETL, the variant search index is generated
  2. When I run POS, the variant search index is added to the opensearch backend
  3. When I query the API for the search endpoint, I can search for variants.
d0choa commented 4 months ago

Anything in dbXrefs can be consumed as synonym:

One example (10_114045297_G_C) from the API response:

      "dbXrefs": [
        {
          "id": "rs1801253",
          "source": "ensemblVariation"
        },
        {
          "id": "109630#0001",
          "source": "omim"
        },
        {
          "id": "VCV000017746",
          "source": "clinVar"
        },
        {
          "id": "10_114045297_G_C",
          "source": "protVar"
        },
        {
          "id": "10-114045297-G-C",
          "source": "gnomad"
        }
      ],
jdhayhurst commented 4 months ago

I have created the variant search index based on the data here: gs://genetics_etl_python_playground/ds_vep_0720/variant_index.json/

Variants are linked to targets via the ranking in their transcriptConsequences.transcriptIndex - and taking the top 3. We then take the top 3 diseases for those targets from the indirect associations and add those disease labels to the variant index (NOTE that we should probably change the way we link from V -> D). A relevance score is calculated using transcriptConsequences.consequenceScore + 1 and multiplying by the target-disease association score.

TODO: add the variants to the disease and and target search indices so that when you search for a variant, you will also return targets and diseases.

To extend the current search request to the API to include the variant search index, it would like the follow:

# Write your query or mutation here
query SearchQuery($queryString: String!) {
  topHit: search(
    queryString: $queryString
    entityNames: ["target", "disease", "drug", "variant"]
    page: {index: 0, size: 1}
  ) {
    hits {
      id
      entity
      score
      object {
        ... on VariantIndex {
          variantId
          rsIds
          __typename
        }
        ... on Target {
          id
          approvedSymbol
          approvedName
          functionDescriptions
          __typename
        }
        ... on Disease {
          id
          name
          description
          __typename
        }
        ... on Drug {
          id
          name
          description
          mechanismsOfAction {
            rows {
              mechanismOfAction
              __typename
            }
            __typename
          }
          __typename
        }
        __typename
      }
      __typename
    }
    __typename
  }
  variants: search(
    queryString: $queryString
    entityNames: ["variant"]
    page: {index: 0, size: 3}
  ) {
    hits {
      id
      entity
      score
      object {
        ... on VariantIndex {
          variantId
          __typename
        }
        __typename
      }
      __typename
    }
    __typename
  }
  targets: search(
    queryString: $queryString
    entityNames: ["target"]
    page: {index: 0, size: 3}
  ) {
    hits {
      id
      entity
      object {
        ... on Target {
          id
          approvedSymbol
          approvedName
          __typename
        }
        __typename
      }
      __typename
    }
    __typename
  }
  diseases: search(
    queryString: $queryString
    entityNames: ["disease"]
    page: {index: 0, size: 3}
  ) {
    hits {
      id
      entity
      object {
        ... on Disease {
          id
          name
          __typename
        }
        __typename
      }
      __typename
    }
    __typename
  }
  drugs: search(
    queryString: $queryString
    entityNames: ["drug"]
    page: {index: 0, size: 3}
  ) {
    hits {
      id
      entity
      object {
        ... on Drug {
          id
          name
          __typename
        }
        __typename
      }
      __typename
    }
    __typename
  }
}

This is using the VariantIndex type for variants, but we have discussed changing this to Variant so that's something to bear in mind.

buniello commented 4 months ago

@carcruz and FE team feel free to start exploring this dataset from your side (as discussed in the team leads meeting)

jdhayhurst commented 4 months ago

The current plan after the variant page meeting is that the disease and target labels should not be added to the variant search index. However, the variants should be added to the target search index.

DSuveges commented 4 months ago

disease and target labels should not be added to the variant search index. However, the variants should be added to the target search index.

So it means, when searching for a variant, we'll get targets and diseases suggested, but no variants are suggested if searched for a diseases or targets? I very much agree with this call.

jdhayhurst commented 4 months ago

Yes (except searching for variants only yields targets no diseases - yet anyway).

Search for variant -> variant & target Search for target | disease | drug -> target & disease & drug

buniello commented 3 months ago

As discussed this week, @gjmcn and @chinmehta will start building a FE feature on top of the current API & while james is on holidays. tagging @prashantuniyal02 for reference in the next two weeks.

emcdonagh-OT commented 3 months ago

May be out of scope for the MVP, but it would be great to have the ability to search by protein coordinates, as well as genomic, as part of the variant synonyms as for coding variants this is how people will recognise/remember them and want to look them up. This may already be covered but couldn't see reference above.

Some examples: rs121913530: KRASG12C, NP_203524.1:p.Gly12Cys rs186045772: CFTRF1074L, NP_000483.3:p.Phe1074Leu

The ProtVar team (contact: James Stephenson) has different kinds of mappings for genome > protein which may be useful.

This will also be relevant for pharmacogenetic star allele variants where we would want users to be able to search by the star allele. Examples: CYP2D6 2, CYP2D6 1xN, CYP2D6 35 Seen in the pharmacogenetic widget here - https://platform.opentargets.org/target/ENSG00000100197 Currently we link to mapping information on PharmGKB (e.g. https://www.pharmgkb.org/haplotype/PA165816577)

d0choa commented 3 months ago

As discussed in Aug 7th...

jdhayhurst commented 5 days ago

TODO add two new variant identifiers. When the amino acid change is given: