Open RachelKarchin opened 1 year ago
Morning session goals
Use civicpy (preferred) or graphql api to get the following information for molecular profiles in civic.
- Queryable information based on chrom-pos-ref-alt for singlevariants
- ??? for complex molecular profiles, probably genomic range
- Annotation information
- Variant ID
- Molecular Profile ID
- Curation status (include only if REJECTED and DEPRECATED)
- Description
- Molecular Profile Score
Note: civicpy does not even get deprecated variants, such as this variant.
To get coordinates from molecular profiles:
from civicpy import civic
profile = get_all_molecular_profiles()
profile[0].variants[0].coordinates
For pulling MPs/Variants, we will pull if there is at least 1 EID that is accepted OR submitted
Docs on molecule profile scores: https://civic.readthedocs.io/en/latest/model/molecular_profiles/evidence_score.html?highlight=molecule%20profile
@cmprocknow and I got all accepted and submitted MPs. We decided to list the number of accepted and number of submitted EIDs associated to the MP.
annotated_mps = list()
molecular_profiles = civicpy.get_all_molecular_profiles(include_status=["accepted", "submitted"])
for mp in molecular_profiles:
ev_counts = {
"accepted": 0,
"submitted": 0
}
for ev in mp.evidence_items:
if ev.status in {"accepted", "submitted"}:
ev_counts[ev.status] += 1
annotated_mps.append({
"mp_id": mp.id,
"name": mp.name,
"variant_ids": mp.variant_ids,
"molecular_profile_score": mp.molecular_profile_score,
"num_acc_eids": ev_counts["accepted"],
"num_sub_eids": ev_counts["submitted"]
})
annotated_mps[0]
Returns:
{
"mp_id": 12,
"variant_ids": [
12
],
"molecular_profile_score": 1353.5,
"num_acc_eids": 93,
"num_sub_eids": 91
}
import csv
import sys
writer = csv.writer(sys.stdout)
profile = civic.get_all_molecular_profiles(include_status=["accepted", "submitted"])
for p in profile:
for variant in p.variants:
coordinates = variant.coordinates
molecular_profile_id = variant.single_variant_molecular_profile
chrom, start, ref, var = coordinates.chromosome, coordinates.start, coordinates.reference_bases, coordinates.variant_bases
if not (ref and var):
continue
writer.writerow([chrom, start, ref, var, molecular_profile_id.id])
returns:
...
9,133750263,C,T,1515
9,133750266,C,G,1516
12,25398284,C,T,79
7,55241708,G,C,973
7,55259515,T,G,33
7,55249071,C,T,34
I made a repo for code
https://github.com/KarchinLab/civic-opencravat-2023
We'll need it this afternoon to keep track of the annotators.
For now we can use the fork/pr model to add things, though I can give people direct write access as needed.
To get the operator... I think we'll need to make an issue in civicpy. This is found in civic here
# Below is an example of fetching a page of accepted Evidence Items using the GraphQL API.
# You can press the "Play >" button to run the query and see the response. Note how the structure mirrors the fields requested in the query.
# Clicking "Docs" in the upper right will allow you to explore the schema including all available queries and the fields you can request on them.
#
# The GraphiQL environment will offer autocompletion and validation as you experient with what's possible.
#
query MolecularProfileSummary($mpId: Int!) {
molecularProfile(id: $mpId) {
...MolecularProfileSummaryFields
}
}
fragment MolecularProfileSummaryFields on MolecularProfile {
parsedName {
...MolecularProfileParsedName
}
}
fragment MolecularProfileParsedName on MolecularProfileSegment {
__typename
... on MolecularProfileTextSegment {
text
}
... on Gene {
id
name
link
}
... on Variant {
id
name
link
deprecated
}
}
In Query Variables (this is hidden)
{
"mpId":4432
}
Since civicpy currently doesn't pull that information, here's a POST request to access it
import requests
import json
def fetch_molecular_profile(mpId):
url = 'https://civicdb.org/api/graphql'
query = """
query MolecularProfileSummary($mpId: Int!) {
molecularProfile(id: $mpId) {
...MolecularProfileSummaryFields
}
}
fragment MolecularProfileSummaryFields on MolecularProfile {
parsedName {
...MolecularProfileParsedName
}
}
fragment MolecularProfileParsedName on MolecularProfileSegment {
__typename
... on MolecularProfileTextSegment {
text
}
... on Gene {
id
name
link
}
... on Variant {
id
name
link
deprecated
}
}
"""
variables = {"mpId": mpId}
response = requests.post(url, json={'query': query, 'variables': variables})
if response.status_code == 200:
return response.json()
else:
response.raise_for_status()
# Example usage
mpId = 4432
result = fetch_molecular_profile(mpId)
print(result)
civicpy.update_cache(from_remote_cache=False)
Submitter Name
Kyle Moad/Rachel Karchin
Submitter Affiliation
Johns Hopkins
Submitter Github Handle
kmoad/RachelKarchin
Additional Submitter Details
We are the PI and Lead Engineer of the OpenCRAVAT project. OpenCRAVAT is a variant annotation tool with a focus on cancer-related variants.
https://github.com/KarchinLab/open-cravat
Project Details
OpenCRAVAT currently annotates variants with information from CIVIC and CIVIC provides a link to the OpenCRAVAT Single Variant Page for each variant. Over 150 annotators are supported.
This project will extend integration of OpenCRAVAT and CIVIC by creating a new annotator for CIVIC Molecular Profiles.
Because variants do not occur in isolation, annotation of known effects of related variants is an important advance in interpreting the clinical relevance of variants. It will further allow annotation of VCF files with Molecular Profiles and improve dissemination of this new CIVIC feature.
Required Skills
Required: Python Beneficial: SQL Optional: Javascript, HTML/CSS