Closed kromanenkov closed 6 years ago
There is an option treating_missing_as_wildtype
for this. Please check the doc for details.
Thanks for your answer, Prof. Peng!
I also have another question about this option: documentation says that besides converting missing genotypes it converts removed low-quality genotypes as well. Does it mean that previously removed variants (using vtools remove variants ...
) would be converted too? Or it concerns only filtering and selecting variants by using vtools associate
options?
@kromanenkov vtools remove variants
will indeed remove the entire variant site from all samples in the data. But the filtering criteria here is also on the variant level. For low quality genotype calls, vtools remove genotype
will mark them as missing but they will still present in the data unless the entire variant site is missing. There are 2 things you can do from here:
vtools remove genotype
, then use vtools remove variants
conditional on missing ratevtools associate
provides an on-the-fly method to filter variants / samples at gene level based on the degree of missing genotypes. Unlike 1, it does not change the original data.
Hello!
I have some VCF files (one sample in a file) with only 0/1 and 1/1 GT entries (no 0/0 GT). When I import them into VariantTools and run association tests, the resulting genotypes matrix consists only of 1, 2 and NA. Also during association testing many samples are discarded due to missing genotype info - AFAIU because 0/0 GT entries are not explicitly present in VCF files - which drastically reduce dimension of the genotype matrix.
So is there a rule of thumb for dealing with such VCF files? I noticed that datasets using in VariantTools tutorials contains 0/0 GT entries. So how can I transfer my data to such format? Or maybe there is a option in VariantTools to not treat such variants as missing genotype info?
Thanks