griffithlab / civic-meeting

Repo for advertising and organizing CIViC unconference/meeting activities
10 stars 3 forks source link

CIVIC Molecular Profiles Integration with OpenCRAVAT #61

Open RachelKarchin opened 1 year ago

RachelKarchin commented 1 year ago

Submitter Name

Kyle Moad/Rachel Karchin

Submitter Affiliation

Johns Hopkins

Submitter Github Handle

kmoad/RachelKarchin

Additional Submitter Details

We are the PI and Lead Engineer of the OpenCRAVAT project. OpenCRAVAT is a variant annotation tool with a focus on cancer-related variants.

https://github.com/KarchinLab/open-cravat

Project Details

OpenCRAVAT currently annotates variants with information from CIVIC and CIVIC provides a link to the OpenCRAVAT Single Variant Page for each variant. Over 150 annotators are supported.

This project will extend integration of OpenCRAVAT and CIVIC by creating a new annotator for CIVIC Molecular Profiles.

Because variants do not occur in isolation, annotation of known effects of related variants is an important advance in interpreting the clinical relevance of variants. It will further allow annotation of VCF files with Molecular Profiles and improve dissemination of this new CIVIC feature.

Required Skills

Required: Python Beneficial: SQL Optional: Javascript, HTML/CSS

kmoad commented 1 year ago

Morning session goals

Use civicpy (preferred) or graphql api to get the following information for molecular profiles in civic.

- Queryable information based on chrom-pos-ref-alt for singlevariants
-     ??? for complex molecular profiles, probably genomic range
- Annotation information
-     Variant ID
-     Molecular Profile ID
-     Curation status (include only if REJECTED and DEPRECATED)
-     Description
-     Molecular Profile Score
korikuzma commented 1 year ago

Note: civicpy does not even get deprecated variants, such as this variant.

BnetButter commented 1 year ago

To get coordinates from molecular profiles:

from civicpy import civic

profile = get_all_molecular_profiles()
profile[0].variants[0].coordinates
korikuzma commented 1 year ago

For pulling MPs/Variants, we will pull if there is at least 1 EID that is accepted OR submitted

korikuzma commented 1 year ago

Docs on molecule profile scores: https://civic.readthedocs.io/en/latest/model/molecular_profiles/evidence_score.html?highlight=molecule%20profile

korikuzma commented 1 year ago

@cmprocknow and I got all accepted and submitted MPs. We decided to list the number of accepted and number of submitted EIDs associated to the MP.

annotated_mps = list()
molecular_profiles = civicpy.get_all_molecular_profiles(include_status=["accepted", "submitted"])
for mp in molecular_profiles:
    ev_counts = {
        "accepted": 0,
        "submitted": 0
    }
    for ev in mp.evidence_items:
        if ev.status in {"accepted", "submitted"}:
            ev_counts[ev.status] += 1

    annotated_mps.append({
        "mp_id": mp.id,
        "name": mp.name,
        "variant_ids": mp.variant_ids,
        "molecular_profile_score": mp.molecular_profile_score,
        "num_acc_eids": ev_counts["accepted"],
        "num_sub_eids": ev_counts["submitted"]
    })
annotated_mps[0]

Returns:

{
  "mp_id": 12,
  "variant_ids": [
    12
  ],
  "molecular_profile_score": 1353.5,
  "num_acc_eids": 93,
  "num_sub_eids": 91
}
BnetButter commented 1 year ago
import csv
import sys

writer = csv.writer(sys.stdout)

profile = civic.get_all_molecular_profiles(include_status=["accepted", "submitted"])
for p in profile:
    for variant in p.variants:
        coordinates = variant.coordinates

        molecular_profile_id = variant.single_variant_molecular_profile
        chrom, start, ref, var = coordinates.chromosome, coordinates.start, coordinates.reference_bases, coordinates.variant_bases
        if not (ref and var):
            continue

        writer.writerow([chrom, start, ref, var, molecular_profile_id.id])

returns:

...
9,133750263,C,T,1515
9,133750266,C,G,1516
12,25398284,C,T,79
7,55241708,G,C,973
7,55259515,T,G,33
7,55249071,C,T,34
mcannon068nw commented 1 year ago

https://civicdb.org/molecular-profiles/4374/summary @korikuzma

kmoad commented 1 year ago

I made a repo for code

https://github.com/KarchinLab/civic-opencravat-2023

We'll need it this afternoon to keep track of the annotators.

For now we can use the fork/pr model to add things, though I can give people direct write access as needed.

korikuzma commented 1 year ago

To get the operator... I think we'll need to make an issue in civicpy. This is found in civic here

# Below is an example of fetching a page of accepted Evidence Items using the GraphQL API.
# You can press the "Play >" button to run the query and see the response. Note how the structure mirrors the fields requested in the query.
# Clicking "Docs" in the upper right will allow you to explore the schema including all available queries and the fields you can request on them.
#
# The GraphiQL environment will offer autocompletion and validation as you experient with what's possible.
#
query MolecularProfileSummary($mpId: Int!) {
  molecularProfile(id: $mpId) {
    ...MolecularProfileSummaryFields
  }
}

fragment MolecularProfileSummaryFields on MolecularProfile {
  parsedName {
    ...MolecularProfileParsedName
  }
}

fragment MolecularProfileParsedName on MolecularProfileSegment {
  __typename
  ... on MolecularProfileTextSegment {
    text
  }
  ... on Gene {
    id
    name
    link
  }
  ... on Variant {
    id
    name
    link
    deprecated
  }
}

In Query Variables (this is hidden)

{
  "mpId":4432
}
BnetButter commented 1 year ago

Since civicpy currently doesn't pull that information, here's a POST request to access it

import requests
import json

def fetch_molecular_profile(mpId):
    url = 'https://civicdb.org/api/graphql'

    query = """
    query MolecularProfileSummary($mpId: Int!) {
      molecularProfile(id: $mpId) {
        ...MolecularProfileSummaryFields
      }
    }

    fragment MolecularProfileSummaryFields on MolecularProfile {
      parsedName {
        ...MolecularProfileParsedName
      }
    }

    fragment MolecularProfileParsedName on MolecularProfileSegment {
      __typename
      ... on MolecularProfileTextSegment {
        text
      }
      ... on Gene {
        id
        name
        link
      }
      ... on Variant {
        id
        name
        link
        deprecated
      }
    }
    """

    variables = {"mpId": mpId}

    response = requests.post(url, json={'query': query, 'variables': variables})
    if response.status_code == 200:
        return response.json()
    else:
        response.raise_for_status()

# Example usage
mpId = 4432
result = fetch_molecular_profile(mpId)
print(result)
korikuzma commented 1 year ago

civicpy.update_cache(from_remote_cache=False)