Closed BoPeng closed 7 years ago
With
vtools init test
vtools admin --load_snapshot vt_testData
vtools import CEU.vcf.gz --build hg18 --var_info DP
vtools output variant chr pos ref alt 'genotype()' 'samples()' -l5
you can do
$ vtools update variant --set 'sample_names=samples()'
$ vtools output variant chr pos ref alt sample_names -l5
1 533 G C NA06985,NA06986,NA06994,NA07000,NA07037,NA07051,NA07346,NA07347,NA07357,NA10847,NA10851,NA11829,NA11830,NA11831,NA11832,NA11840,NA11881,NA11894,NA11918,NA11919,NA11920,NA11931,NA11992,NA11993,NA11994,NA11995,NA12003,NA12004,NA12005,NA12006,NA12043,NA12044,NA12045,NA12144,NA12154,NA12155,NA12156,NA12234,NA12249,NA12287,NA12414,NA12489,NA12716,NA12717,NA12749,NA12750,NA12751,NA12760,NA12761,NA12762,NA12763,NA12776,NA12812,NA12813,NA12814,NA12815,NA12828,NA12872,NA12873,NA12874
1 41342 T A NA06985,NA06986,NA06994,NA07000,NA07037,NA07051,NA07346,NA07347,NA07357,NA10847,NA10851,NA11829,NA11830,NA11831,NA11832,NA11840,NA11881,NA11894,NA11918,NA11919,NA11920,NA11931,NA11992,NA11993,NA11994,NA11995,NA12003,NA12004,NA12005,NA12006,NA12043,NA12044,NA12045,NA12144,NA12154,NA12155,NA12156,NA12234,NA12249,NA12287,NA12414,NA12489,NA12716,NA12717,NA12749,NA12750,NA12751,NA12760,NA12761,NA12762,NA12763,NA12776,NA12812,NA12813,NA12814,NA12815,NA12828,NA12872,NA12873,NA12874
1 41791 G A NA06985,NA06986,NA06994,NA07000,NA07037,NA07051,NA07346,NA07347,NA07357,NA10847,NA10851,NA11829,NA11830,NA11831,NA11832,NA11840,NA11881,NA11894,NA11918,NA11919,NA11920,NA11931,NA11992,NA11993,NA11994,NA11995,NA12003,NA12004,NA12005,NA12006,NA12043,NA12044,NA12045,NA12144,NA12154,NA12155,NA12156,NA12234,NA12249,NA12287,NA12414,NA12489,NA12716,NA12717,NA12749,NA12750,NA12751,NA12760,NA12761,NA12762,NA12763,NA12776,NA12812,NA12813,NA12814,NA12815,NA12828,NA12872,NA12873,NA12874
1 44449 T C NA06985,NA06986,NA06994,NA07000,NA07037,NA07051,NA07346,NA07347,NA07357,NA10847,NA10851,NA11829,NA11830,NA11831,NA11832,NA11840,NA11881,NA11894,NA11918,NA11919,NA11920,NA11931,NA11992,NA11993,NA11994,NA11995,NA12003,NA12004,NA12005,NA12006,NA12043,NA12044,NA12045,NA12144,NA12154,NA12155,NA12156,NA12234,NA12249,NA12287,NA12414,NA12489,NA12716,NA12717,NA12749,NA12750,NA12751,NA12760,NA12761,NA12762,NA12763,NA12776,NA12812,NA12813,NA12814,NA12815,NA12828,NA12872,NA12873,NA12874
1 44539 C T NA06985,NA06986,NA06994,NA07000,NA07037,NA07051,NA07346,NA07347,NA07357,NA10847,NA10851,NA11829,NA11830,NA11831,NA11832,NA11840,NA11881,NA11894,NA11918,NA11919,NA11920,NA11931,NA11992,NA11993,NA11994,NA11995,NA12003,NA12004,NA12005,NA12006,NA12043,NA12044,NA12045,NA12144,NA12154,NA12155,NA12156,NA12234,NA12249,NA12287,NA12414,NA12489,NA12716,NA12717,NA12749,NA12750,NA12751,NA12760,NA12761,NA12762,NA12763,NA12776,NA12812,NA12813,NA12814,NA12815,NA12828,NA12872,NA12873,NA12874
Note that you cannot do samples=samples()
because the name is reserved.
Now, you can export the field as usual
$ vtools export variant --format vcf --var_info sample_names | head -5
Writing: 0.0% [> ] in 00:00:001 533 . G C . PASS NA06985,NA06986,NA06994,NA07000,NA07037,NA07051,NA07346,NA07347,NA07357,NA10847,NA10851,NA11829,NA11830,NA11831,NA11832,NA11840,NA11881,NA11894,NA11918,NA11919,NA11920,NA11931,NA11992,NA11993,NA11994,NA11995,NA12003,NA12004,NA12005,NA12006,NA12043,NA12044,NA12045,NA12144,NA12154,NA12155,NA12156,NA12234,NA12249,NA12287,NA12414,NA12489,NA12716,NA12717,NA12749,NA12750,NA12751,NA12760,NA12761,NA12762,NA12763,NA12776,NA12812,NA12813,NA12814,NA12815,NA12828,NA12872,NA12873,NA12874
1 41342 . T A . PASS NA06985,NA06986,NA06994,NA07000,NA07037,NA07051,NA07346,NA07347,NA07357,NA10847,NA10851,NA11829,NA11830,NA11831,NA11832,NA11840,NA11881,NA11894,NA11918,NA11919,NA11920,NA11931,NA11992,NA11993,NA11994,NA11995,NA12003,NA12004,NA12005,NA12006,NA12043,NA12044,NA12045,NA12144,NA12154,NA12155,NA12156,NA12234,NA12249,NA12287,NA12414,NA12489,NA12716,NA12717,NA12749,NA12750,NA12751,NA12760,NA12761,NA12762,NA12763,NA12776,NA12812,NA12813,NA12814,NA12815,NA12828,NA12872,NA12873,NA12874
1 41791 . G A . PASS NA06985,NA06986,NA06994,NA07000,NA07037,NA07051,NA07346,NA07347,NA07357,NA10847,NA10851,NA11829,NA11830,NA11831,NA11832,NA11840,NA11881,NA11894,NA11918,NA11919,NA11920,NA11931,NA11992,NA11993,NA11994,NA11995,NA12003,NA12004,NA12005,NA12006,NA12043,NA12044,NA12045,NA12144,NA12154,NA12155,NA12156,NA12234,NA12249,NA12287,NA12414,NA12489,NA12716,NA12717,NA12749,NA12750,NA12751,NA12760,NA12761,NA12762,NA12763,NA12776,NA12812,NA12813,NA12814,NA12815,NA12828,NA12872,NA12873,NA12874
1 44449 . T C . PASS NA06985,NA06986,NA06994,NA07000,NA07037,NA07051,NA07346,NA07347,NA07357,NA10847,NA10851,NA11829,NA11830,NA11831,NA11832,NA11840,NA11881,NA11894,NA11918,NA11919,NA11920,NA11931,NA11992,NA11993,NA11994,NA11995,NA12003,NA12004,NA12005,NA12006,NA12043,NA12044,NA12045,NA12144,NA12154,NA12155,NA12156,NA12234,NA12249,NA12287,NA12414,NA12489,NA12716,NA12717,NA12749,NA12750,NA12751,NA12760,NA12761,NA12762,NA12763,NA12776,NA12812,NA12813,NA12814,NA12815,NA12828,NA12872,NA12873,NA12874
1 44539 . C T . PASS NA06985,NA06986,NA06994,NA07000,NA07037,NA07051,NA07346,NA07347,NA07357,NA10847,NA10851,NA11829,NA11830,NA11831,NA11832,NA11840,NA11881,NA11894,NA11918,NA11919,NA11920,NA11931,NA11992,NA11993,NA11994,NA11995,NA12003,NA12004,NA12005,NA12006,NA12043,NA12044,NA12045,NA12144,NA12154,NA12155,NA12156,NA12234,NA12249,NA12287,NA12414,NA12489,NA12716,NA12717,NA12749,NA12750,NA12751,NA12760,NA12761,NA12762,NA12763,NA12776,NA12812,NA12813,NA12814,NA12815,NA12828,NA12872,NA12873,NA12874
but the info field does not have the sample_names=
header. To really export the field, you would have to define a customized vcf format by
~/.varianttools/fmt/vcf.fmt
to myvcf.fmt
myvcf.fmt
and add the following section[sample_names]
index=0
type=VARCHAR(255)
fmt=lambda x: x.replace(',', '|'), InfoFormatter('SampleNames')
$ vtools export variant --format myvcf --var_info sample_names | head -5
Writing: 0.0% [> ] in 00:00:001 533 . G C . PASS SampleNames=NA06985|NA06986|NA06994|NA07000|NA07037|NA07051|NA07346|NA07347|NA07357|NA10847|NA10851|NA11829|NA11830|NA11831|NA11832|NA11840|NA11881|NA11894|NA11918|NA11919|NA11920|NA11931|NA11992|NA11993|NA11994|NA11995|NA12003|NA12004|NA12005|NA12006|NA12043|NA12044|NA12045|NA12144|NA12154|NA12155|NA12156|NA12234|NA12249|NA12287|NA12414|NA12489|NA12716|NA12717|NA12749|NA12750|NA12751|NA12760|NA12761|NA12762|NA12763|NA12776|NA12812|NA12813|NA12814|NA12815|NA12828|NA12872|NA12873|NA12874
1 41342 . T A . PASS SampleNames=NA06985|NA06986|NA06994|NA07000|NA07037|NA07051|NA07346|NA07347|NA07357|NA10847|NA10851|NA11829|NA11830|NA11831|NA11832|NA11840|NA11881|NA11894|NA11918|NA11919|NA11920|NA11931|NA11992|NA11993|NA11994|NA11995|NA12003|NA12004|NA12005|NA12006|NA12043|NA12044|NA12045|NA12144|NA12154|NA12155|NA12156|NA12234|NA12249|NA12287|NA12414|NA12489|NA12716|NA12717|NA12749|NA12750|NA12751|NA12760|NA12761|NA12762|NA12763|NA12776|NA12812|NA12813|NA12814|NA12815|NA12828|NA12872|NA12873|NA12874
1 41791 . G A . PASS SampleNames=NA06985|NA06986|NA06994|NA07000|NA07037|NA07051|NA07346|NA07347|NA07357|NA10847|NA10851|NA11829|NA11830|NA11831|NA11832|NA11840|NA11881|NA11894|NA11918|NA11919|NA11920|NA11931|NA11992|NA11993|NA11994|NA11995|NA12003|NA12004|NA12005|NA12006|NA12043|NA12044|NA12045|NA12144|NA12154|NA12155|NA12156|NA12234|NA12249|NA12287|NA12414|NA12489|NA12716|NA12717|NA12749|NA12750|NA12751|NA12760|NA12761|NA12762|NA12763|NA12776|NA12812|NA12813|NA12814|NA12815|NA12828|NA12872|NA12873|NA12874
1 44449 . T C . PASS SampleNames=NA06985|NA06986|NA06994|NA07000|NA07037|NA07051|NA07346|NA07347|NA07357|NA10847|NA10851|NA11829|NA11830|NA11831|NA11832|NA11840|NA11881|NA11894|NA11918|NA11919|NA11920|NA11931|NA11992|NA11993|NA11994|NA11995|NA12003|NA12004|NA12005|NA12006|NA12043|NA12044|NA12045|NA12144|NA12154|NA12155|NA12156|NA12234|NA12249|NA12287|NA12414|NA12489|NA12716|NA12717|NA12749|NA12750|NA12751|NA12760|NA12761|NA12762|NA12763|NA12776|NA12812|NA12813|NA12814|NA12815|NA12828|NA12872|NA12873|NA12874
1 44539 . C T . PASS SampleNames=NA06985|NA06986|NA06994|NA07000|NA07037|NA07051|NA07346|NA07347|NA07357|NA10847|NA10851|NA11829|NA11830|NA11831|NA11832|NA11840|NA11881|NA11894|NA11918|NA11919|NA11920|NA11931|NA11992|NA11993|NA11994|NA11995|NA12003|NA12004|NA12005|NA12006|NA12043|NA12044|NA12045|NA12144|NA12154|NA12155|NA12156|NA12234|NA12249|NA12287|NA12414|NA12489|NA12716|NA12717|NA12749|NA12750|NA12751|NA12760|NA12761|NA12762|NA12763|NA12776|NA12812|NA12813|NA12814|NA12815|NA12828|NA12872|NA12873|NA12874
Here I used a lambda function to replace ,
with |
, but you can remove the lambda function to use ,
(which is allowed in variant info).
I am storing a lot of variants from different samples within a vtools project. Now, I need to export these variants in vcf-format, but I also need to have a field with genotype information per sample within the info-field. With
vtools output
I can use these functions:genotype(,'missing=.')
andsamples()
which gives me exactly what I want. But then, there will not be the vcf-specific variant format (insertions and deletions are represented with an additional reference base)How can I use the named funtions with
vtools export
in order to get the vcf-format? Or is it possible to produce vcf-format withvtools output
by any means?