Closed jldimond closed 7 years ago
Here is what you have....
##fileformat=VCFv4.0
##fileDate=2016/09/22
##source=ipyrad_v.0.3.41
##reference=past.fasta
##phasing=unphased
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=CATG,Number=1,Type=String,Description="Base Counts (CATG)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 101_ddr 102_epi 103_ddr 103_epi 104_ddr 104_epi 105_ddr 105_epi 106_ddr 106_epi 107_ddr 107_epi 108_ddr 108_epi 109_ddr 109_epi 110_ddr 110_epi 111_ddr 111_epi 112_ddr 112_epi 113_epi 114_ddr 114_epi 115_ddr 115_epi 116_ddr 116_epi 117_ddr 117_epi 118_ddr 118_epi 120_epi 121_ddr 121_epi 122_ddr 122_epi 123_ddr 123_epi 124_ddr 124_epi 125_ddr 125_epi 126_ddr 126_epi 127_ddr 127_epi 128_ddr 128_epi 129_ddr 129_epi 130_ddr 130_epi 131_ddr 131_epi 80_ddr 80_epi 81_ddr 81_epi 82_ddr 82_epi 84_ddr 84_epi 85_ddr 85_epi 86_ddr 86_epi 87_ddr 87_epi 88b_ddr 88b_epi 89_ddr 89_epi 90_ddr 90_epi 91_ddr 91_epi 95_ddr 95_epi 96_ddr 96_epi 98_ddr 98_epi 99_ddr 99_epi w11_ddr w11_epi w1_ddr w1_epi w3_ddr w3_epi
7 0 . T . 13 PASS NS=37;DP=380 GT:CATG 0/0:0,0,10,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,0,10,0 0/0:0,0,23,0 0/0:0,8,0,0 0/0:0,0,12,0 ./.:0,0,0,0 0/0:0,0,7,0 ./.:0,0,0,0 0/0:0,0,6,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,0,6,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,0,14,0 0/0:0,0,6,0 0/0:0,0,6,0 0/0:0,0,18,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,0,7,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,6,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,0,8,0 0/0:0,0,11,0 0/0:0,0,8,0 0/0:0,0,7,0 0/0:0,0,6,0 ./.:0,0,0,0 0/0:0,0,7,0 0/0:0,6,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,0,7,0 0/0:0,0,32,0 0/0:0,0,11,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,0,8,0 ./.:0,0,0,0 0/0:0,0,18,0 0/0:0,0,26,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,0,8,0 0/0:0,6,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,0,11,0 0/0:0,0,10,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,0,9,0 0/0:0,0,8,0 ./.:0,0,0,0 0/0:0,0,10,0 0/0:0,0,6,0 0/0:0,0,8,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,0,10,0 ./.:0,0,0,0 ./.:0,0,0,0
7 1 . A . 13 PASS NS=37;DP=380 GT:CATG 0/0:0,10,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,10,0,0 0/0:0,23,0,0 0/0:0,8,0,0 0/0:0,12,0,0 ./.:0,0,0,0 0/0:0,7,0,0 ./.:0,0,0,0 0/0:0,6,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,6,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,14,0,0 0/0:0,6,0,0 0/0:0,6,0,0 0/0:0,18,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,7,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,6,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,8,0,0 0/0:0,11,0,0 0/0:0,8,0,0 0/0:0,7,0,0 0/0:0,6,0,0 ./.:0,0,0,0 0/0:0,7,0,0 0/0:0,6,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,7,0,0 0/0:0,32,0,0 0/0:0,11,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,8,0,0 ./.:0,0,0,0 0/0:0,18,0,0 0/0:0,26,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,8,0,0 0/0:0,0,0,6 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,11,0,0 0/0:0,10,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,9,0,0 0/0:0,8,0,0 ./.:0,0,0,0 0/0:0,10,0,0 0/0:0,6,0,0 0/0:0,8,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,10,0,0 ./.:0,0,0,0 ./.:0,0,0,0
7 2 . A G 13 PASS NS=37;DP=378 GT:CATG 0/0:0,10,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,10,0,0 0/0:0,23,0,0 0/0:0,0,8,0 0/0:0,12,0,0 ./.:0,0,0,0 0/0:0,7,0,0 ./.:0,0,0,0 1/0:0,3,0,3 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,6,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,13,0,0 0/0:0,6,0,0 0/0:0,6,0,0 0/0:0,17,0,1 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,7,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,0,6,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,8,0,0 0/0:0,11,0,0 0/0:0,8,0,0 0/0:0,7,0,0 0/0:0,6,0,0 ./.:0,0,0,0 0/0:0,7,0,0 0/0:0,0,6,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,7,0,0 0/0:0,32,0,0 0/0:0,11,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 0/0:0,8,0,0 ./.:0,0,0,0 1/1:0,0,0,18 1/1:0,0,0,26 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 1/0:0,2,0,6 1/1:0,0,5,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 1/1:0,0,0,11 1/1:0,0,0,10 ./.:0,0,0,0 ./.:0,0,0,0 1/1:0,0,0,9 1/1:0,0,0,8 ./.:0,0,0,0 1/1:0,0,0,10 1/1:0,0,0,6 1/1:0,0,0,8 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 1/1:0,0,0,10 ./.:0,0,0,0 ./.:0,0,0,0
Provide an example of what you want to get to for these three lines..
Ideally it would look something like this for the first three lines and first two records:
CHROM 101_ddr 102_epi
7 10 0
7 10 0
7 10 0
Really, I only need the first line for each record. It is important to note that for some records the base counts at each position vary, so this field needs to be summed. Example: 0/0:0,1,9,0 So need sum of 0,1,9,0 = 10
Here's the workflow I worked on today. I did not push to course repo because ipyrad is running and gitignores are causing desktop to freeze.
https://github.com/jldimond/ipython-notebooks/blob/master/VCF_readcounts.ipynb
I'd like to be able to summarize the following fields. I was trying to use VCF Tools to do this, but the .vcf file is not formatted the way it wants it to be. I think just extracting the columns into a new text file would be fine. I feel like I am getting there, but am posting the issue as we discussed yesterday.
An example file is located here:
https://github.com/jldimond/jldimond-fish546-2016/blob/master/analyses/data1_all.vcf