Closed akijarl closed 4 years ago
It seems like the a shift in the formatting of the original vcf files (the REF columns of select genotypes were not appropriate characters) was the source of the inconsistent number of blank genotypes. Not sure why the pattern is inconsistent, but it doesn't seem like the original problem lies with read.vcfR()
If the REF column is missing then it sounds like this is not a valid VCF file. Thank you for taking the time the time to resolve and close this issue!
Hello,
While running a pipeline of replicate simulation data, I noticed that random calculated allele frequencies were showing up as NA in different runs.
I traced this back to select genotype entries getting dropped (showing up as blank entries "" instead of "0|0" or "0|1", etc.) when reading vcf files into R with read.vcfR().
I've confirmed that the original vcf files are intact, and I created a smaller test data set ("test.vcf") and spot-checked which genotypes were missing and it seems to be random:
Across the five trials above of just reading in the same file again and again I get five different genotypes showing up blank. Any insights on what might be causing this?
R session info below