Closed Flavjack closed 2 years ago
Hi @Flavjack , I would not consider the VCF specification to be a form of "tidy data". The "tidy" functions in vcfR are there to help you work with VCF data if you would like to convert them to a tidy format. However, this conversion should be considered a departure from the VCF specification. If your goal is to produce a VCF compliant file I would suggest you avoid the tidy functions. If you're interested in producing a VCF compliant file I would suggest you make as few manipulations as possible. This is because every manipulation you make provides an opportunity for you to deviate from the VCF specification. I'd suggest something like the following.
library(vcfR)
#>
#> ***** *** vcfR *** *****
#> This is vcfR 1.12.0.9999
#> browseVignettes('vcfR') # Documentation
#> citation('vcfR') # Citation
#> ***** ***** ***** *****
data("vcfR_test")
vcfR_test
#> ***** Object of Class vcfR *****
#> 3 samples
#> 1 CHROMs
#> 5 variants
#> Object size: 0 Mb
#> 0 percent missing data
#> ***** ***** *****
vcfR_test@gt
#> FORMAT NA00001 NA00002 NA00003
#> [1,] "GT:GQ:DP:HQ" "0|0:48:1:51,51" "1|0:48:8:51,51" "1/1:43:5:.,."
#> [2,] "GT:GQ:DP:HQ" "0|0:49:3:58,50" "0|1:3:5:65,3" "0/0:41:3"
#> [3,] "GT:GQ:DP:HQ" "1|2:21:6:23,27" "2|1:2:0:18,2" "2/2:35:4"
#> [4,] "GT:GQ:DP:HQ" "0|0:54:7:56,60" "0|0:48:4:51,51" "0/0:61:2"
#> [5,] "GT:GQ:DP" "0/1:35:4" "0/2:17:2" "1/1:40:3"
colnames(vcfR_test@gt)[-1] <- paste("newName", 1:3, sep = "")
vcfR_test@gt
#> FORMAT newName1 newName2 newName3
#> [1,] "GT:GQ:DP:HQ" "0|0:48:1:51,51" "1|0:48:8:51,51" "1/1:43:5:.,."
#> [2,] "GT:GQ:DP:HQ" "0|0:49:3:58,50" "0|1:3:5:65,3" "0/0:41:3"
#> [3,] "GT:GQ:DP:HQ" "1|2:21:6:23,27" "2|1:2:0:18,2" "2/2:35:4"
#> [4,] "GT:GQ:DP:HQ" "0|0:54:7:56,60" "0|0:48:4:51,51" "0/0:61:2"
#> [5,] "GT:GQ:DP" "0/1:35:4" "0/2:17:2" "1/1:40:3"
Created on 2022-07-26 by the reprex package (v2.0.1)
Note that the "FORMAT" column must be named "FORMAT" if you want your data to be VCF compliant. Does this address your issue?
@knausb thanks for the explanation and the REPREX. The explanation solved my question. Many thanks :)
It is possible to rename headers in Genotype information (@gt)?
I tried importing the vcf file with
vcfR2tidy()
and after manipulating the Genotype information and get it. But I was not able to re-export it as a new vcf file. Do you have any hints? Thanks