quinlan-lab / vcf2db

create a gemini-compatible database from a VCF
MIT License
56 stars 13 forks source link

Codons are blank? #21

Closed davemcg closed 7 years ago

davemcg commented 7 years ago

Using the latest vcf2db

Codon field (codon_change) is blank. I'm using VEP to annotate consequences.

$ gemini query -q 'select chrom, start, end, codon_change, aa_change from variants WHERE is_exonic=1' CCGO.2016-12-12.db --header | head -n 10 | csvlook -t          
|--------+--------+--------+--------------+------------|
|  chrom | start  | end    | codon_change | aa_change  |
|--------+--------+--------+--------------+------------|
|  1     | 69269  | 69270  |              | S          |
|  1     | 69427  | 69428  |              | F/C        |
|  1     | 69510  | 69511  |              | T/A        |
|  1     | 69760  | 69761  |              | D/V        |
|  1     | 69848  | 69849  |              | W/*        |
|  1     | 69896  | 69897  |              | S          |
|  1     | 865693 | 865694 |              | H/Y        |
|  1     | 865699 | 865700 |              | R/C        |
|  1     | 865704 | 865705 |              | M/I        |
|--------+--------+--------+--------------+------------|

VEP's CSQ line in my VCF has Codon change:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Consequence|Codons|Amino_acids|Gene|SYMBOL|Feature|EXON|PolyPhen|SIFT|Protein_position|BIOTYPE|CANONICAL|DOMAINS|CLIN_SIG....

1:69269-69270 line does have codon info in the vcf

1       69270   .       A       G       24603.3 FAIL_McGaughey_SNP_filter_v01   AC=53;AF=0.779;AN=68;BaseQRankSum=-1.733;ClippingRankSum=0.193;DP=1117;ExcessHet=0.0012;FS=0;InbreedingCoeff=0.747;MLEAC=56;MLEAF=0.824;MQ=28.83;MQRankSum=0.512;QD=23.77;ReadPosRankSum=0.483;SOR=5.447;set=FilteredInAll;;CSQ=synonymous_variant|tcA/tcG|S|ENSG00000186092|OR4F5|ENST00000335137|1/1|||60/305|protein_coding||Low_complexity_(Seg):seg&Transmembrane_helices:TMhelix&Prints_domain:PR00237&Superfamily_domains:SSF81321&Gene3D:1.20.1070.10&hmmpanther:PTHR26451&hmmpanther:PTHR26451:SF72&PROSITE_profiles:PS50262||||ENST00000335137.3:c.180A>G|ENST00000335137.3:c.180A>G(p.%3D)|||-0.817044|0.039;n_syn=5;adj_exp_syn=27.6132199337414;syn_z=2.72474672669623;n_mis=7;adj_exp_mis=63.7647441819615;mis_z=3.54158912707877;n_lof=0;adj_exp_lof=2.03876604453072;lof_z=1.40852623525699;pLI=0.550302420215064;pRecessive=0.39663970580189;pNull=0.0530578739830465;pfam_domain=7tm_1;in_exac;ac_exac_all=1019;exac_num_het=241;exac_num_hom_alt=389;an_exac_all=1584;ac_exac_afr=166;an_exac_afr=568;ac_exac_amr=48;an_exac_amr=76;ac_exac_eas=114;an_exac_eas=116;ac_exac_fin=11;an_exac_fin=14;ac_exac_nfe=467;an_exac_nfe=560;ac_exac_oth=12;an_exac_oth=18;ac_exac_sas=201;an_exac_sas=232;rs_ids=rs201219564;fitcons_float=0.4871;encode_consensus_gm12878=R;encode_consensus_h1hesc=R;encode_consensus_helas3=R;encode_consensus_hepg2=R;encode_consensus_huvec=R;encode_consensus_k562=unknown;gerp_elements=0;dgv=CopyNumber;hapmap1=2.0824;hapmap2=0.0806;af_exac_all=0.6433;af_exac_afr=0.2923;af_exac_amr=0.6316;af_exac_eas=0.9828;af_exac_nfe=0.8339;af_exac_oth=0.6667;af_exac_sas=0.8664;max_aaf_all=0.9828  GT:AD:DP:GQ:PL..........
arq5x commented 7 years ago

Is there a column called "Codons"?

davemcg commented 7 years ago

No

gemini query -q 'select chrom, start, end, codon_change, Codons, aa_change from variants WHERE is_exonic=1' CCGO.2016-12-12.db --header | head -n 10 | csvlook -t     
SQL error: (sqlite3.OperationalError) no such column: Codons [SQL: u'select chrom,  start,  end,  codon_change,  Codons,  aa_change from variants WHERE is_exonic=1']
Traceback (most recent call last):
  File "/usr/local/bin/gemini", line 6, in <module>
    gemini.gemini_main.main()
  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 1227, in main
    args.func(parser, args)
  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_main.py", line 422, in query_fn
    gemini_query.query(parser, args)
  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_query.py", line 167, in query
    run_query(args)
  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/gemini_query.py", line 133, in run_query
    gene_needed, args.show_families, subjects=subjects)
  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 638, in run
    self.result_proxy = res = iter(self._apply_query())
  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 908, in _apply_query
    res = self._execute_query()
  File "/usr/local/share/gemini/anaconda/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 867, in _execute_query
    raise ValueError("The query issued (%s) has a syntax error." % self.query)
ValueError: The query issued (select chrom,  start,  end,  codon_change,  Codons,  aa_change from variants WHERE is_exonic=1) has a syntax error.
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|  SQL error: (sqlite3.OperationalError) no such column: Codons [SQL: u'select chrom,  start,  end,  codon_change,  Codons,  aa_change from variants WHERE is_exonic=1']  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
davemcg commented 7 years ago
$ gemini db_info CCGO.2016-12-12.db 
table_name          column_name                   type      
variants            variant_id                    INTEGER   
variants            chrom                         VARCHAR(10)
variants            start                         INTEGER   
variants            end                           INTEGER   
variants            vcf_id                        VARCHAR(12)
variants            ref                           TEXT      
variants            alt                           TEXT      
variants            qual                          FLOAT     
variants            filter                        VARCHAR(35)
variants            type                          VARCHAR(8)
variants            sub_type                      VARCHAR(20)
variants            call_rate                     FLOAT     
variants            num_hom_ref                   INTEGER   
variants            num_het                       INTEGER   
variants            num_hom_alt                   INTEGER   
variants            aaf                           FLOAT     
variants            hwe                           FLOAT     
variants            inbreeding_coef               FLOAT     
variants            pi                            FLOAT     
variants            gene                          VARCHAR(20)
variants            transcript                    VARCHAR(20)
variants            is_exonic                     BOOLEAN   
variants            is_coding                     BOOLEAN   
variants            is_lof                        BOOLEAN   
variants            is_splicing                   BOOLEAN   
variants            exon                          VARCHAR(8)
variants            codon_change                  TEXT      
variants            aa_change                     TEXT      
variants            aa_length                     VARCHAR(14)
variants            biotype                       TEXT      
variants            impact                        VARCHAR(41)
variants            impact_so                     VARCHAR(41)
variants            impact_severity               VARCHAR(4)
variants            polyphen_pred                 VARCHAR(20)
variants            polyphen_score                FLOAT     
variants            sift_pred                     VARCHAR(31)
variants            sift_score                    FLOAT     
variants            an                            INTEGER   
variants            baseqranksum                  FLOAT     
variants            clinvar_diseases              TEXT      
variants            clippingranksum               FLOAT     
variants            dp                            INTEGER   
variants            ds                            BOOLEAN   
variants            exome_chip                    BOOLEAN   
variants            excesshet                     FLOAT     
variants            fs                            FLOAT     
variants            gene_eyediseaseclass          TEXT      
variants            hgmd_overlap                  TEXT      
variants            haplotypescore                FLOAT     
variants            inbreedingcoeff               FLOAT     
variants            mq                            FLOAT     
variants            mqranksum                     FLOAT     
variants            old_multiallelic              TEXT      
variants            old_variant                   TEXT      
variants            qd                            FLOAT     
variants            raw_mq                        FLOAT     
variants            readposranksum                FLOAT     
variants            sor                           FLOAT     
variants            aaf_1kg_afr_float             FLOAT     
variants            aaf_1kg_all_float             FLOAT     
variants            aaf_1kg_amr_float             FLOAT     
variants            aaf_1kg_eas_float             FLOAT     
variants            aaf_1kg_eur_float             FLOAT     
variants            aaf_1kg_sas_float             FLOAT     
variants            aaf_esp_aa                    FLOAT     
variants            aaf_esp_all                   FLOAT     
variants            aaf_esp_ea                    FLOAT     
variants            adj_exp_lof                   TEXT      
variants            adj_exp_mis                   TEXT      
variants            adj_exp_syn                   TEXT      
variants            af_exac_afr                   FLOAT     
variants            af_exac_all                   FLOAT     
variants            af_exac_amr                   FLOAT     
variants            af_exac_eas                   FLOAT     
variants            af_exac_nfe                   FLOAT     
variants            af_exac_oth                   FLOAT     
variants            af_exac_sas                   FLOAT     
variants            an_exac_afr                   FLOAT     
variants            an_exac_all                   FLOAT     
variants            an_exac_amr                   FLOAT     
variants            an_exac_eas                   FLOAT     
variants            an_exac_fin                   FLOAT     
variants            an_exac_nfe                   FLOAT     
variants            an_exac_oth                   FLOAT     
variants            an_exac_sas                   FLOAT     
variants            clinvar_pathogenic            TEXT      
variants            clinvar_sig                   VARCHAR(5)
variants            common_pathogenic             TEXT      
variants            cosmic_ids                    TEXT      
variants            cpg_island                    BOOLEAN   
variants            cse_hiseq                     BOOLEAN   
variants            dgv                           TEXT      
variants            encode_consensus_gm12878      TEXT      
variants            encode_consensus_h1hesc       VARCHAR(5)
variants            encode_consensus_helas3       TEXT      
variants            encode_consensus_hepg2        TEXT      
variants            encode_consensus_huvec        TEXT      
variants            encode_consensus_k562         VARCHAR(5)
variants            exac_num_het                  FLOAT     
variants            exac_num_hom_alt              FLOAT     
variants            fitcons_float                 FLOAT     
variants            geno2mp                       BOOLEAN   
variants            gerp_elements                 FLOAT     
variants            gwas_pubmed_trait             TEXT      
variants            hapmap1                       FLOAT     
variants            hapmap2                       FLOAT     
variants            in_1kg                        BOOLEAN   
variants            in_esp                        BOOLEAN   
variants            in_exac                       BOOLEAN   
variants            lof_z                         VARCHAR(5)
variants            max_aaf_all                   FLOAT     
variants            mis_z                         TEXT      
variants            n_lof                         TEXT      
variants            n_mis                         VARCHAR(5)
variants            n_syn                         TEXT      
variants            pli                           TEXT      
variants            pnull                         TEXT      
variants            precessive                    TEXT      
variants            pfam_domain                   TEXT      
variants            rmsk                          TEXT      
variants            rs_ids                        TEXT      
variants            set                           VARCHAR(5)
variants            stam_mean                     FLOAT     
variants            stam_names                    TEXT      
variants            syn_z                         VARCHAR(5)
variants            tfbs                          TEXT      
variants            canonical                     VARCHAR(10)
variants            domains                       TEXT      
variants            clin_sig                      TEXT      
variants            grantham                      VARCHAR(10)
variants            maxentscan                    VARCHAR(10)
variants            hgvsc                         TEXT      
variants            hgvsp                         VARCHAR(40)
variants            pubmed                        TEXT      
variants            phenotypes                    VARCHAR(10)
variants            cadd_raw                      VARCHAR(10)
variants            cadd_phred                    VARCHAR(10)
variants            gts                           BLOB      
variants            gt_types                      BLOB      
variants            gt_phases                     BLOB      
variants            gt_depths                     BLOB      
variants            gt_ref_depths                 BLOB      
variants            gt_alt_depths                 BLOB      
variants            gt_quals                      BLOB      
variant_impacts     variant_id                    INTEGER   
variant_impacts     gene                          VARCHAR(20)
variant_impacts     transcript                    VARCHAR(20)
variant_impacts     is_exonic                     BOOLEAN   
variant_impacts     is_coding                     BOOLEAN   
variant_impacts     is_lof                        BOOLEAN   
variant_impacts     is_splicing                   BOOLEAN   
variant_impacts     exon                          VARCHAR(8)
variant_impacts     codon_change                  TEXT      
variant_impacts     aa_change                     TEXT      
variant_impacts     aa_length                     VARCHAR(14)
variant_impacts     biotype                       TEXT      
variant_impacts     impact                        VARCHAR(20)
variant_impacts     impact_so                     VARCHAR(41)
variant_impacts     impact_severity               VARCHAR(4)
variant_impacts     polyphen_pred                 VARCHAR(20)
variant_impacts     polyphen_score                FLOAT     
variant_impacts     sift_pred                     VARCHAR(31)
variant_impacts     sift_score                    FLOAT     
variant_impacts     canonical                     VARCHAR(10)
variant_impacts     domains                       TEXT      
variant_impacts     clin_sig                      TEXT      
variant_impacts     grantham                      VARCHAR(10)
variant_impacts     maxentscan                    VARCHAR(10)
variant_impacts     hgvsc                         TEXT      
variant_impacts     hgvsp                         VARCHAR(40)
variant_impacts     pubmed                        TEXT      
variant_impacts     phenotypes                    VARCHAR(10)
variant_impacts     cadd_raw                      VARCHAR(10)
variant_impacts     cadd_phred                    VARCHAR(10)
samples             sample_id                     INTEGER   
samples             family_id                     VARCHAR(20)
samples             name                          VARCHAR(11)
samples             paternal_id                   VARCHAR(20)
samples             maternal_id                   VARCHAR(20)
samples             sex                           VARCHAR(1)
samples             phenotype                     VARCHAR(2)
brentp commented 7 years ago

Thanks for reporting. I just pushed a fix for this to the geneimpacts module.

davemcg commented 7 years ago

Thanks all