griffithlab / civic-client

Web client for CIViC: Clinical Interpretations of Variants in Cancer
MIT License
50 stars 28 forks source link

Searching by coordinates or variant name rather than variant id with the REST API #351

Closed lloonngg closed 4 years ago

lloonngg commented 8 years ago

I work at the MGH Center for Integrated Diagnostics and we are are exploring CIViC for potential use and also contribution from our clinical NGS sequencing for both genotyping and also fusions. I was wondering if the REST API could be used to find variants based on chromosome, start, stop, ref, and alt rather than by variant ID. Or simply by variant name. This will be most helpful since we would not know what the variant ID is unless we had a dictionary with which to match up our variants.

Thanks,

Long

obigriffith commented 8 years ago

Thanks for checking out CIViC! Improved documentation and use cases for the API is at the top of our to do list. For now, I don't think there is an API endpoint for directly querying CIViC by coordinate. We are working on building one that support HGVS. Also, be aware that not all CIViC variants have coordinates yet (although we are close). These are very rarely provided in the literature and therefore represent a curation task.

You should also be aware that a variant level query will always be somewhat imperfect. This is because many (most) variants at the amino acid level can actually result from multiple different (equally valid) genomic changes. The genomic coordinates in CIViC for each amino acid variant are therefore for a "representative" genomic variant. Therefore, you should consider a somewhat fuzzy matching scheme where you attempt to match your variants to CIViC using a combination of position (with some wiggle considered) and string matching.

What I suggest you do is simply query CIViC for all genes, then for each gene query for all variants. CIViC is not huge since it is focused on manually curated and interpreted variants with a lot of work going into each variant. Therefore getting the complete set back is manageable. Then for each variant you will have the variant name and coordinates (where available) for the matching exercise I describe above. This is something we are also working on. We will post sample code to the github repo as it becomes available.

Please let us know if we can help further. We would be very happy to have MGH as users and especially contributors to CIViC!

malachig commented 8 years ago

@lloonngg for some specific examples of how to interact with the CIViC API using Python, you can check out this module:

https://github.com/griffithlab/civic-api-client

One of the things that this code does is query CIViC for variants with coordinates and examines those coordinates. So some of the building blocks for what you need to do are there.

susannasiebert commented 4 years ago

I know that this issue has been open a while but there has been some more recent development to facilitate this. You can now use Allele Registry CA IDs to search for variants. In addition, we developed the CIViCpy SDK, which supports coordinate queries (http://docs.civicpy.org/en/1.0/civic.html#by-coordinate). I'm closing this issue.