opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Include cancers biomarkers as an evidence data source #1705

Closed ireneisdoomed closed 2 years ago

ireneisdoomed commented 3 years ago

This ticket tracks the whole discussion on how the data has been modelled and parsed to be part of our target-disease evidence data sources as of the 21.09 release.

I'm copying here all the comments issued in the current PR(#89).

This PR processes the Cancer Biomarkers database available at gs://otar000-evidence_input/CancerBiomarkers/data_files. The proposed schema has been tracked and can be observed in this spreadsheet: https://docs.google.com/spreadsheets/d/1Mowq7KsGTMtEg3wZpJBNK_UbawHKJeM9d0syT9F9AMc/edit#gid=613866016

datasourceId

datatypeId

diseaseFromSource

diseaseFromSourceMappedId

drugFromSource

drugId

drugResponse

confidence

literature

urls

To source the other 25% without a PMID.

targetFromSourceId

Every time that a biomarker consists of multiple variants that are not independent of each other, the biomarker is reported separating them with a '+'. When this situation happens, genes will be described under Gene separated by ';'. As we can only build evidence with a single target, these are separated into different evidence strings but the biomarker will reference both of them. These cases account for 27 distinct biomarkers.

biomarkers

Array of structs that will capture dependent and independent variants, as well as secondary fields to describe the mutation. The proposed fields are encapsulated in a struct so that a conceptual difference can be made when analysing data: variantId refers to a disease causing variant, whereas biomarkers.variantId adds the nuance of the biomarker having to be present for the association to occur.

biomarkers.name

biomarkers.individualMutation

biomarkers.variantFunctionalConsequenceId

biomarkers.variantId

biomarkers.variantRsId

ireneisdoomed commented 3 years ago

New iter

New changes to the parser to adapt to latest schema (v3):

TO-DOs

An example:

{
    "biomarkerName": "ARID1A amplification + ANXA1 overexpression",
    "biomarkers": {
        "geneExpression": [
            {
                "id": "GO_0010628",
                "name": "ANXA1:over"
            }
        ],
        "variant": [
            {
                "functionalConsequenceId": "SO_0001563",
                "name": "ARID1A:amp"
            }
        ]
    },
    "confidence": "Early trials",
    "datasourceId": "cancer_genome_interpreter",
    "datatypeId": "affected_pathway",
    "diseaseFromSource": "Breast adenocarcinoma",
    "diseaseFromSourceMappedId": "EFO_0000304",
    "drugFromSource": "Trastuzumab",
    "drugId": "CHEMBL1201585",
    "literature": [
        "27172896"
    ],
    "targetFromSourceId": "ARID1A"
},
{
    "biomarkerName": "ARID1A amplification + ANXA1 overexpression",
    "biomarkers": {
        "geneExpression": [
            {
                "id": "GO_0010628",
                "name": "ANXA1:over"
            }
        ],
        "variant": [
            {
                "functionalConsequenceId": "SO_0001563",
                "name": "ARID1A:amp"
            }
        ]
    },
    "confidence": "Early trials",
    "datasourceId": "cancer_genome_interpreter",
    "datatypeId": "affected_pathway",
    "diseaseFromSource": "Breast adenocarcinoma",
    "diseaseFromSourceMappedId": "EFO_0000304",
    "drugFromSource": "Trastuzumab",
    "drugId": "CHEMBL1201585",
    "literature": [
        "27172896"
    ],
    "targetFromSourceId": "ANXA1"
}
ireneisdoomed commented 3 years ago

The evidence file can be found at gs://otar000-evidence_input/CancerBiomarkers/json

DSuveges commented 3 years ago

I think your example shows a proper representation of the biomarker.

ireneisdoomed commented 3 years ago

Drug responses mapping to EFO scoped for 21.11 (#1746).

ireneisdoomed commented 2 years ago

Work completed and included in 21.11