vatlab / vat-docs

Documentation for Variant Tools.
https://vatlab.github.io/vat/
4 stars 0 forks source link

phenotype command #17

Closed BingLi17 closed 6 years ago

BingLi17 commented 6 years ago

After changing vtools import CEU.vcf.gz --build hg18 --var_info DP --geno_info DP_geno to vtools import CEU.vcf.gz --build hg38 --var_info DP --geno_info DP_geno

vtools phenotype --output sample_name sample_wildtype BMI 
ERROR: no such column: sample_wildtype

Also some outputs in this page are different from in the vtools documentation.

BingLi17 commented 6 years ago

Also liftover:

$ vtools liftover hg19
No data is available.
$ vtools liftover hg38
WARNING: Disable alternative reference genome because of missing column alt_bin, alt_chr, alt_pos.
No data is available.
BingLi17 commented 6 years ago

Also show:

$  vtools show annotations | head -50

CancerGeneCensus-20111215 Cancer Genome Project
CancerGeneCensus-20120315 Cancer Genome Project
CancerGeneCensus-20130711 This database contains variants from the Cancer
                        Genome Project. It is an ongoing effort to catalogue
                        those genes for which mutations have been causally
                        implicated in cancer. The original census and
                        analysis was published in Nature Reviews Cancer and
                        supplemental analysis information related to the
                        paper is also available. Currently, more than 1% of
                        all human genes are implicated via mutation in
                        cancer. Of these, approximately 90% have somatic
                        mutations in cancer, 20% bear germline mutations
                        that predispose to cancer and 10% show both somatic
                        and germline mutations.
CancerGeneCensus-20170912 This database contains variants from the Cancer
                        Genome Project. It is an ongoing effort to catalogue
                        those genes for which mutations have been causally
                        implicated in cancer. The original census and
                        analysis was published in Nature Reviews Cancer and
                        supplemental analysis information related to the
                        paper is also available. Currently, more than 1% of
                        all human genes are implicated via mutation in
                        cancer. Of these, approximately 90% have somatic
                        mutations in cancer, 20% bear germline mutations
                        that predispose to cancer and 10% show both somatic
                        and germline mutations.
CosmicCodingMuts-v61_260912 Cosmic coding mutation database.  This data
                        contains mutations affecting 10 or less nucleotides
                        in REF.  The mutation data was obtained from the
                        Sanger Institute Catalogue Of Somatic Mutations In
                        Cancer web site, http://www.sanger.ac.uk/cosmic.
                        Bamford et al (2004). The COSMIC (Catalogue of
                        Somatic Mutations in Cancer) database and website.
                        Br J Cancer, 91,355-358.
CosmicCodingMuts-v67_20131024 Cosmic coding mutation database.  This data
                        contains mutations affecting 10 or less nucleotides
                        in REF.  The mutation data was obtained from the
                        Sanger Institute Catalogue Of Somatic Mutations In
                        Cancer web site, http://www.sanger.ac.uk/cosmic.
                        Bamford et al (2004). The COSMIC (Catalogue of
                        Somatic Mutations in Cancer) database and website.
                        Br J Cancer, 91,355-358.
CosmicCodingMuts-v82_20170801 Cosmic coding mutation database.  This data
                        contains mutations affecting 10 or less nucleotides
                        in REF.  The mutation data was obtained from the
                        Sanger Institute Catalogue Of Somatic Mutations In
                        Cancer web site, http://cancer.sanger.ac.uk/cosmic.
                        The COSMIC (Catalogue of Somatic Mutations in
                        Cancer) database and website. Br J Cancer,
                        91,355-358.
ERROR: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe
BingLi17 commented 6 years ago

Show:

vtools show annotation knownGene
ERROR: Database knownGene is not currently used in the project.
BoPeng commented 6 years ago

For the last one, vtools use knownGene is needed before the show command.

BoPeng commented 6 years ago

$ vtools show annotations | head -50

The broken pipe thing is because of the use of pipe from command line. The problem was ignored in python2 and but shows in python 3.

BoPeng commented 6 years ago

lifeover needs to have data. Check if the project is currently empty.

BoPeng commented 6 years ago

vtools phenotype first check if phenotype is imported properly (e.g. vtools show phenotype.

BingLi17 commented 6 years ago

update

$ vtools update variant --from_stat 'num=#(GT)'
Counting variants:   0.0% [>                                                     ]  in 00:00:00ERROR: near "select": syntax error
BingLi17 commented 6 years ago

use

$vtools use dbSNP
 INFO: Choosing version dbSNP-hg18_130 from 10 available databases.
INFO: Downloading annotation database annoDB/dbSNP-hg18_130.ann
INFO: Downloading annotation database from annoDB/dbSNP-hg18_130.DB.gz
dbSNP-hg18_130.DB.gz: 100% [=================================] 617,049,496.0 1.3M/s in 00:07:38

why choose hg18_130?

BoPeng commented 6 years ago

vtools use will choose the latest version of annotation database that matches the primary reference genome of your project. Use vtools show to check the reference genome of the project.

jma7 commented 6 years ago

Should we set a default reference genome? @BoPeng

BoPeng commented 6 years ago

No because it is determined by your data. The primary reference genome is set either

vtools init --build

or

vtools import data --build

so users would be in trouble if a default reference genome does not match the one used by the data.

I believe the latest version (hg38) would be used if you do not specify --build anywhere and try to run vtools use someAnnoDB.

jma7 commented 6 years ago

Please indicate the commands are from which markdown file, maybe add the line number from the markdown too. @tutudong

jma7 commented 6 years ago

These issues have been addressed.