vatlab / vat-docs

Documentation for Variant Tools.
https://vatlab.github.io/vat/
4 stars 0 forks source link

documentation-concept-6 #20

Open BingLi17 opened 6 years ago

BingLi17 commented 6 years ago

https://vatlab.github.io/vat-docs/documentation/vtools_commands/update/ Examples: add genotype info field

$ vtools update variant --from_file V*_hg38.vcf --geno_info DP
INFO: Using primary reference genome hg38 of the project.
Getting existing variants: 100% [===================] 2,051 297.0K/s in 00:00:00
INFO: Updating variants from V1_hg38.vcf (1/3)
V1_hg38.vcf: 100% [==================================] 1,618 15.6K/s in 00:00:00
INFO: Fields  of 0 variants and geno fields of 1,277 genotypes are updated
INFO: Updating variants from V2_hg38.vcf (2/3)
ERROR: Specified (SAMP1) and detected sample names (SAMP2) mismatch.

There is a problem here. @jma7 It seems you are not using the latest updates. Please checkout v3beta branch. @BingLi17

BingLi17 commented 6 years ago

Since DB is not in the snapshot, this error is generated for sqlite version. Leave it like this for now.

documentation-init-2.2

 vtools select variant --samples "sample_name=='CEU'" -t CEU
ERROR: Could not get schema of table test_genotype.genotype_2
BingLi17 commented 6 years ago

For the update examples, the correct vcf file to use is V*_hg38.vcf, you were using the CEU files. Please run again @tutudong

update 2.1 Examples: import additional fields from source files

%  vtools show genotypes 
ERROR: no such table: genotype_1

... many differences on this page

Got it.

BingLi17 commented 6 years ago

Fixed, when storemode is sqlite, importing --geno_info DP_geno instead of DP

phenotype lsat example

$ vtools phenotype --output "avg(meanDP)"
273.67465753424625
BingLi17 commented 6 years ago

It seems another vcf is imported. Please make sure you import vtools import V*_hg38.vcf --build hg38 Please check again @tutudong

use 2.1-2.3 different numbers in the outputs

$ vtools use CancerGeneCensus --linked_by refGene.name2
INFO: Choosing version CancerGeneCensus-20170912 from 4 available databases.
INFO: Downloading annotation database annoDB/CancerGeneCensus-20170912.ann
INFO: Using annotation DB CancerGeneCensus as CancerGeneCensus in project use.
INFO: This database contains variants from the Cancer Genome Project. It is
an ongoing effort to catalogue those genes for which mutations have been causally
implicated in cancer. The original census and analysis was published in Nature
Reviews Cancer and supplemental analysis information related to the paper is also
available. Currently, more than 1% of all human genes are implicated via mutation
in cancer. Of these, approximately 90% have somatic mutations in cancer, 20% bear
germline mutations that predispose to cancer and 10% show both somatic and
germline mutations.
INFO: 448 out of 28031 refGene.refGene.name2 are annotated through annotation database CancerGeneCensus
WARNING: 78 out of 526 values in annotation database CancerGeneCensus are not linked to the project
vtools select variant 'refGene.chr is not NULL' -c
Counting variants: 85 818.7/s in 00:00:00                                                              
5302
vtools select variant 'refGene.chr is not NULL' -c
WARNING: Cannot open annotation database CancerGeneCensus: Failed to locate linked_by field refGene.refGene.name2: Failed to locate field name2
Counting variants: 80 824.3/s in 00:00:00                                                              
5060
vtools select variant 'refGene.chr is not NULL' -c
Counting variants: 91 798.2/s in 00:00:00                                                              
5772
vtools select variant 'gwasCatalog.chr is not null' -c
Counting variants: 13 1.3K/s in 00:00:00                                                               
56
vtools select variant 'gwasCatalog.chr is not null' -c
Counting variants: 42 1.1K/s in 00:00:00                                                               
1470
vtools use CancerGeneCensus --linked_fields kgID --linked_by knownGene.name
INFO: Choosing version CancerGeneCensus-20170912 from 4 available databases.
INFO: Downloading annotation database annoDB/CancerGeneCensus-20170912.ann
INFO: Using annotation DB CancerGeneCensus as CancerGeneCensus in project use.
INFO: This database contains variants from the Cancer Genome Project. It is
an ongoing effort to catalogue those genes for which mutations have been causally
implicated in cancer. The original census and analysis was published in Nature
Reviews Cancer and supplemental analysis information related to the paper is also
available. Currently, more than 1% of all human genes are implicated via mutation
in cancer. Of these, approximately 90% have somatic mutations in cancer, 20% bear
germline mutations that predispose to cancer and 10% show both somatic and
germline mutations.
INFO: 0 out of 197782 knownGene.knownGene.name are annotated through annotation database CancerGeneCensus
WARNING: 526 out of 526 values in annotation database CancerGeneCensus are not linked to the project.
 vtools use CancerGeneCensus --linked_by knownGene.name
INFO: Choosing version CancerGeneCensus-20170912 from 4 available databases.
INFO: Downloading annotation database annoDB/CancerGeneCensus-20170912.ann
INFO: Using annotation DB CancerGeneCensus as CancerGeneCensus in project use.
INFO: This database contains variants from the Cancer Genome Project. It is
an ongoing effort to catalogue those genes for which mutations have been causally
implicated in cancer. The original census and analysis was published in Nature
Reviews Cancer and supplemental analysis information related to the paper is also
available. Currently, more than 1% of all human genes are implicated via mutation
in cancer. Of these, approximately 90% have somatic mutations in cancer, 20% bear
germline mutations that predispose to cancer and 10% show both somatic and
germline mutations.
INFO: 0 out of 197782 knownGene.knownGene.name are annotated through annotation database CancerGeneCensus
WARNING: 526 out of 526 values in annotation database CancerGeneCensus are not linked to the project.
$ vtools use gwasCatalog --anno_type field --linked_fields region --linked_by cytoBand.name
INFO: Choosing version gwasCatalog-hg38_20171004 from 3 available databases.
INFO: Downloading annotation database annoDB/gwasCatalog-hg38_20171004.ann
INFO: /export/cfs05space/bli5/.variant_tools/annoDB/gwasCatalog-hg38_20171004.DB: MD5 signature mismatch, the database might have been upgraded locally.
INFO: Using annotation DB gwasCatalog as gwasCatalog in project use.
INFO: This database contains single nucleotide polymorphisms (SNPs) identified by published Genome-Wide
Association Studies (GWAS), collected in the Catalog of Published Genome-Wide Association Studies at the
National Human Genome Research Institute (NHGRI). From http://www.genome.gov/gwastudies/:
INFO: 803 out of 1293 cytoBand.cytoBand.name are annotated through annotation database gwasCatalog
WARNING: 1 out of 804 values in annotation database gwasCatalog are not linked to the project.
$ vtools output variant chr pos cytoBand.name gwasCatalog.genes gwasCatalog.trait --all -l 10
1   1182895 1p36.33 GNB1                                                                                                        Body mass index
1   1182895 1p36.33 GNB1                                                                                                        Body mass index
1   1182895 1p36.33 NR                                                                                                          Body mass index
1   1182895 1p36.33 Intergenic                                                                                                  Cancer (pleiotropy)
1   1182895 1p36.33 NR                                                                                                          Crohn's disease
1   1182895 1p36.33 PRKCZ                                                                                                       Height
1   1182895 1p36.33 NR                                                                                                          IgG glycosylation
1   1182895 1p36.33 NR                                                                                                          IgG glycosylation
1   1182895 1p36.33 NR                                                                                                          Inflammatory bowel disease
1   1182895 1p36.33 SCNN1D, CPSF3L, TAS1R3, RP4-758J18.2, MRPL20, UBE2J2, ACAP3, PUSL1, GLTPD1, DVL1, MXRA8, CCNL2, AURKAIP1    Inflammatory bowel disease
$ vtools output variant chr pos cytoBand.name gwasCatalog.genes gwasCatalog.trait --all -l 10
1   1182895 1p36.33 GNB1                                                                                                        Body mass index
1   1182895 1p36.33 GNB1                                                                                                        Body mass index
1   1182895 1p36.33 NR                                                                                                          Body mass index
1   1182895 1p36.33 Intergenic                                                                                                  Cancer (pleiotropy)
1   1182895 1p36.33 NR                                                                                                          Crohn's disease
1   1182895 1p36.33 PRKCZ                                                                                                       Height
1   1182895 1p36.33 NR                                                                                                          IgG glycosylation
1   1182895 1p36.33 NR                                                                                                          IgG glycosylation
1   1182895 1p36.33 NR                                                                                                          Inflammatory bowel disease
1   1182895 1p36.33 SCNN1D, CPSF3L, TAS1R3, RP4-758J18.2, MRPL20, UBE2J2, ACAP3, PUSL1, GLTPD1, DVL1, MXRA8, CCNL2, AURKAIP1    Inflammatory bowel disease
HenryLeongStat commented 6 years ago

Rerun the examples using the correct files.