Open Phillip-a-richmond opened 6 years ago
I have added the sample_genotype_counts
table. I am not sure what you mean by depths
table. I don't intend to add the gene tables to vcf2db, but I could be convinced to change my mind given a reasonable use-case.
Thanks Brent, I'll pull this version and test the GEMINI built-in functions.
Essentially, I am striving to produce a fully functional single variant database for identifying pathogenic variants underlying rare mendelian genetic diseases (and the Quinlan lab tools are excellent for this). The problem with only using GEMINI load is the lack of flexibility when it comes to the annotations available (e.g. TrAP, FATHMM-XF, in-house variant databases, monthly updating ClinVar vcfs). And annotating after-the-fact with GEMINI annotate is prohibitively slow for large annotation databases (especially genome-wide databases) across WGS variant datasets.
VCFAnno+VCF2DB provides flexibility with this respect, and it's fast. However, lacking some of the tables which are loaded in GEMINI by default, like the one that you fixed above, causes the GEMINI built-in functions to fail. Ideally, a workflow that goes from VCF-->DB, and then can use GEMINI to query the DB for inheritance patterns, runs of homozygosity, variants within specific gene sets, harmony between noncoding and coding variants, would be ideal.
Thanks for your help on this, Phil
could you enumerate what is missing for you so I can prioritize?
Hi Brent!
The difference between gemini load and vcf2db (loaded in bcbio 1.0.7 with vcfanno: [gemini] and by default): variants table: pfam_domain = domains? aaf_gnomad_all = gnomad_af gnomad_num_het = absent, possible to add? ghomad_num_hom = absent, possible to add? cadd_scaled = absent, possible to add? vep_hgvsc = hgvsc vep_hgvsp = hgvsp aaf_esp_aa = af_esp_aa aaf_esp_ea = af_esp_ea aaf_esp_all = af_esp_all is_conserved = absent, possible to add?
variant_impacts table: vep_canonical = canonical vep_ccds = ccds vep_hgvsc = hgvsc vep_hgvsp = hgvsp vep_maxentscan_diff = maxentscan_diff vep_maxentscan_alt = maxentscan_alt vep_maxentscan_ref = maxentscan_ref vep_spliceregion = spliceregion
Is there a way for downstream scripts to get the creator of gemini.db (gemini load or vcf2db) to apply different processing logic?
Is it possible to add gnomad_num_hemi? https://groups.google.com/forum/#!topic/gemini-variation/knRmriYXDW4
Thanks! Sergey
but these are things that you have control over, correct? in most cases, vcf2db.py just pull what's present in the INFO field. You can change the vcfanno conf if you want different names. Am I missing something?
Thanks Brent! yes, you are right, it is not an issue of vcfanno/vcfdb, it is a way of wrapping annotation in bcbio. SN
Pulled on January 22nd 2018.
Details: "Depths" table, as referenced from gemini roh Example:
$ gemini roh T008.db LOG: Querying and ordering variants by chromosomal position. SQL error: (sqlite3.OperationalError) no such column: depth [SQL: u"select chrom, start, end,gts,gt_types,gt_phases,gt_depths,gt_ref_depths,gt_alt_depths,gt_quals,gt_alt_freqs FROM variants WHERE type = 'snp' AND filter is NULL AND depth >= 20 ORDER BY chrom, end"]
SQL error: (sqlite3.OperationalError) no such column: depth [SQL: u"select chrom, start, end,gts,gt_types,gt_phases,gt_depths,gt_ref_depths,gt_alt_depths,gt_quals,gt_alt_freqs FROM variants WHERE type = 'snp' AND filter is NULL AND depth >= 20 ORDER BY chrom, end"]
Traceback (most recent call last):
File "/opt/tools/gemini/bin/gemini", line 7, in
$ gemini pathways --lof -v 71 T008.db
chrom start end ref alt impact sample genotype gene transcript pathway
Traceback (most recent call last):
File "/opt/tools/gemini/bin/gemini", line 7, in
Priority for our application purposes would include fixing gemini ROH. The pathways-based analysis is a very low priority for us at this time.
Thanks, Phil
Hi,
Following up on @Phillip-a-richmond last comment, I tried running gemini roh
on a gemini database produced with vcf2db.
I am getting the same errors, indicating that the depth
column is missing:
$ gemini roh gemini_db_produced_by_vcf2db.db
LOG: Querying and ordering variants by chromosomal position.
SQL error: (sqlite3.OperationalError) no such column: depth [SQL: u"select chrom, start, end,gts,gt_types,gt_phases,gt_depths,gt_ref_depths,gt_alt_depths,gt_quals,gt_alt_freqs FROM variants WHERE type = 'snp' AND filter is NULL AND depth >= 20 ORDER BY chrom, end"]
SQL error: (sqlite3.OperationalError) no such column: depth [SQL: u"select chrom, start, end,gts,gt_types,gt_phases,gt_depths,gt_ref_depths,gt_alt_depths,gt_quals,gt_alt_freqs FROM variants WHERE type = 'snp' AND filter is NULL AND depth >= 20 ORDER BY chrom, end"]
Traceback (most recent call last):
File "/opt/tools/gemini/bin/gemini", line 7, in <module>
gemini_main.main()
File "/opt/tools/gemini/thirdparty/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1244, in main
args.func(parser, args)
File "/opt/tools/gemini/thirdparty/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1136, in homozygosity_runs_fn
run(parser, args)
File "/opt/tools/gemini/thirdparty/anaconda/lib/python2.7/site-packages/gemini/tool_homozygosity_runs.py", line 215, in run
get_homozygosity_runs(args)
File "/opt/tools/gemini/thirdparty/anaconda/lib/python2.7/site-packages/gemini/tool_homozygosity_runs.py", line 162, in get_homozygosity_runs
gq.run(query, needs_genotypes=True)
File "/opt/tools/gemini/thirdparty/anaconda/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 653, in run
self.result_proxy = res = iter(self._apply_query())
File "/opt/tools/gemini/thirdparty/anaconda/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 906, in _apply_query
res = self._execute_query()
File "/opt/tools/gemini/thirdparty/anaconda/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 883, in _execute_query
raise ValueError("The query issued (%s) has a syntax error." % self.query)
ValueError: The query issued (select chrom, start, end,gts,gt_types,gt_phases,gt_depths,gt_ref_depths,gt_alt_depths,gt_quals,gt_alt_freqs FROM variants WHERE type = 'snp' AND filter is NULL AND depth >= 20 ORDER BY chrom, end) has a syntax error.
I think perhaps some of the previous confusion stemmed from called depth
a table whereas it seems to be a column.
Would it be possible to include the depth
column to the list of annotations that vcf2db builds into the gemini db?
Thanks for all the hard work on these tools! Robin
Hello, I would like to request that the tables used by GEMINI's built-in analysis tools be added into VCF2DB.
Ideally, all tables that are default loaded with the command:
$ gemini load
that are not inherently third party annotations, would be added into the resulting database.
Examples I have run into so far: depths sample_genotype_counts
More complex features that are ideal for our pipeline, but we may need to resort to standard GEMINI load to use: gene_summary pathways and gene detailed analyses
Thanks, Phil