opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Update Somatic Mutations data table to accommodate enhanced IntOGen data #840

Closed andrewhercules closed 4 years ago

andrewhercules commented 4 years ago

The most recent IntOGen data file has been integrated into our pipeline, resulting to changes in the API response and the availability of new data.

An example of the raw evidence string is below:

{
  "access_level": "public",
  "disease": {
    "id": "http://purl.obolibrary.org/obo/MONDO_0007576",
    "name": "esophageal cancer",
    "source_name": "Esophageal cancer"
  },
  "evidence": {
    "date_asserted": "2019-11-12T00:00:00",
    "evidence_codes": [
      "http://purl.obolibrary.org/obo/ECO_0000053"
    ],
    "is_associated": true,
    "known_mutations": [
      {
        "functional_consequence": "http://purl.obolibrary.org/obo/SO_0001564",
        "inheritance_pattern": "dominant",
        "number_mutated_samples": 14,
        "number_samples_tested": 150,
        "preferred_name": "gene_variant"
      }
    ],
    "provenance_type": {
      "database": {
        "dbxref": {
          "id": "IntOGen Cancer Drivers Database",
          "url": "https://www.intogen.org/search",
          "version": "2019.11"
        },
        "id": "IntOGen Cancer Drivers Database",
        "version": "2019.11"
      },
      "literature": {
        "references": [
          {
            "lit_id": "http://europepmc.org/abstract/MED/25759023"
          }
        ]
      }
    },
    "resource_score": {
      "method": {
        "description": "IntOGen Driver identification methods as described in Rubio-Perez, C., Tamborero, D., Schroeder, MP., Antolin, AA., Deu-Pons,J., Perez-Llamas, C., Mestres, J., Gonzalez-Perez, A., Lopez-Bigas, N. In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals novel targeting opportunities. Cancer Cell 27 (2015), pp. 382-396",
        "reference": "http://europepmc.org/abstract/MED/25759023",
        "url": "https://www.intogen.org/about"
      },
      "type": "pvalue",
      "value": 0.0002010297541482517
    },
    "urls": [
      {
        "nice_name": "IntOGen -  ABCB1 gene cancer mutations in esophageal cancer (ESCA)",
        "url": "https://www.intogen.org/search?gene=ABCB1&cohort=ICGC_WGS_ESAD_UK"
      }
    ]
  },
  "sourceID": "intogen",
  "target": {
    "activity": "http://identifiers.org/cttv.activity/gain_of_function",
    "id": "http://identifiers.org/ensembl/ENSG00000085563",
    "target_name": "ABCB1",
    "target_type": "http://identifiers.org/cttv.target/gene_evidence"
  },
  "type": "somatic_mutation",
  "unique_association_fields": {
    "cohort_description": "Esophageal cancer data from ICGC Data Portal (ESAD-UK)",
    "cohort_id": "ICGC_WGS_ESAD_UK",
    "cohort_short_name": "ESAD_UK_ICGC",
    "disease_id": "http://purl.obolibrary.org/obo/MONDO_0007576",
    "methods": "dndscv,cbase",
    "target_id": "http://identifiers.org/ensembl/ENSG00000085563"
  },
  "validated_against_schema_version": "1.2.8"
}

Due to the changes in the API response, we will need to update the Somatic Mutations data table (example).

@LucaFumis, I will update this ticket with a full design spec for an updated data table once the initial pipeline has been run and I can see exactly what fields are available and where in the response object.

andrewhercules commented 4 years ago

Important notes about new data:

d0choa commented 4 years ago

We might need to review the generated evidence. All information required for the FE should be outside the unique association fields. Background is summarised in Epic #827. I'll discuss with @AsierGonzalez and @DSuveges

andrewhercules commented 4 years ago

@d0choa, @AsierGonzalez, and @LucaFumis:

I have completed a preliminary review of the somatic mutations data available from the API for PIK3CA and breast carcinoma on the QA branch - see below for a screenshot:

Screenshot 2020-02-18 at 09 49 22

Full API response - Somatic_Mutation_Data_PIK3CA_breast_carcinoma.json.zip

Based on the response, we can expose the sample data in known_mutations in the Samples column of the Somatic Mutations data table

"known_mutations": [
  {
    "preferred_name": "gene_variant",
     "inheritance_pattern": "dominant",
     "number_mutated_samples": 11,
     "number_samples_tested": 33,
     "functional_consequence": "http://purl.obolibrary.org/obo/SO_0001564"
   }
]

Screenshot 2020-02-18 at 09 55 29

However, we will not be able to show p-value, cohort_name, or methods because this data is not available from the API. Perhaps we can include this in 20.04 instead?

d0choa commented 4 years ago

p-value, cohort_name and methods can be included in the rewrite. We won't have them for other dataosources such as COSMIC, so let's keep the angular table simple

AsierGonzalez commented 4 years ago

I will investigate why those values do not appear, I was expecting them to be there.

andrewhercules commented 4 years ago

Upon further review, the data is available but the front-end requests subsets of the data. I will update the spec to account for the new fields that are used.

andrewhercules commented 4 years ago

My apologies for my earlier comment about the data availability. After reviewing with @d0choa and @AsierGonzalez, I realised that the data is available but the current API call requests a subset of the evidence data.

To minimise the changes for the Angular app, we will only update the table to show the samples. Further changes will be made to the React version of this table - see #851 for more information.

@LucaFumis, for 20.02 can we please display the sample data from IntOGen, which is available in the known_mutations object, which we already retrieve from the API with the existing GET request?

Screenshot 2020-02-18 at 13 13 23

So for example, this row would show 234/708 in the Samples column in the Somatic Mutation data table.

The change would need to be made to the directive for the Somatic Mutation data table. On lines 118-122, we run a check based on number_samples_with_mutation_type.

However, for IntOGen, this data will be available in number_mutated_samples. And the data we want to show is number_mutated_samples / number_samples_tested. So perhaps we could use the following code instead?

if (m.number_samples_with_mutation_type) {
  samples += '<div>' + m.number_samples_with_mutation_type + '/' + m.number_mutated_samples || otDictionary.NA + '</div>';
} else if (m.number_mutated_samples) {
  samples += '<div>' + m.number_mutated_samples + '/' + m.number_samples_tested || otDictionary.NA + '</div>';
} else {
  samples = otDictionary.NA;
}
LucaFumis commented 4 years ago

I've added the code as suggested by @andrewhercules - that is the only change for this issue, so marking as fixed.