googlegenomics / bigquery-examples

Advanced BigQuery examples on genomic data.
Apache License 2.0
89 stars 31 forks source link

Consider using the full version of ClinVar #6

Closed jaredroach closed 9 years ago

jaredroach commented 10 years ago

The current full version appears to be here: ftp://ftp.ncbi.nlm.nih.gov//pub/clinvar/xml/ClinVarFullRelease_2014-05.xml.gz This version seems to contain the RCV records and their fields. These RCV records better enable linking between the ClinVar variants and search terms such as "Bipolar". The tab-delimited downloads (e.g., ftp://ftp.ncbi.nlm.nih.gov//pub/clinvar/tab_delimited/variant_summary.txt.gz) don't seem to have all the necessary fields for linking.

jaredroach commented 10 years ago

Here are some relevant searches

SELECT chromosome, start, clinicalsignificance, phenotypeids, alleleid, type, name, geneid, genesymbol, rs_dbsnp, nsv_dbvar, rcvaccession, testedingtr, origin, otherids, FROM [google.com:biggene:1000genomes.clinvar] WHERE rcvaccession = "RCV000003314"

to get the linker entry for one of the bipolar variants

SELECT chromosome, start, clinicalsignificance, phenotypeids, alleleid, type, name, geneid, genesymbol, rs_dbsnp, nsv_dbvar, rcvaccession, testedingtr, origin, otherids, FROM [google.com:biggene:1000genomes.clinvar] WHERE start = 72366306

SELECT DiseaseName, SourceName, ConceptID, SourceID, DiseaseMIM, COUNT(1) AS cnt FROM [google.com:biggene:1000genomes.clinvar_disease_names] WHERE DiseaseName CONTAINS 'Bip' group by DiseaseName, SourceName, ConceptID, SourceID, DiseaseMIM

and then the web query which finds three bipolar related entries http://www.ncbi.nlm.nih.gov/clinvar?term=bipolar

deflaux commented 9 years ago

Tute Genomics has provided a nice import of ClinVar https://github.com/googlegenomics/bigquery-examples/commit/56e04b4c67e65a265d945ffe5eeca0bbe1bb053e