I found another unexpected behavior when using vcfR2DNAbin() which is not related to #122. When data have indels with missing data and these indels are kept, a DNAbin object of list type is created (due to the generation of sequences of different lengths) but labels are missing. Below a reproducible example and a solution.
The reproducible example
library(ape)
data(vcfR_test)
# Create an example reference sequence.
nucs <- c('a','c','g','t')
set.seed(9)
myRef <- as.DNAbin(matrix(nucs[round(runif(n=20, min=0.5, max=4.5))], nrow=1))
# Recode the POS data for a smaller example.
set.seed(99)
vcfR_test@fix[,'POS'] <- sort(sample(10:20, size=length(getPOS(vcfR_test))))
# Conversion to DNAbin object
vcfR2DNAbin(vcfR_test, extract.indels=FALSE, unphased_as_NA=FALSE)
# This data has no missing data so sequences of same length can be created
# and DNAbin of matrix type is generated
# Introduction of missing data
vcfR_test@gt[5,2] <- "./.:35:4"
# Conversion to DNAbin object
vcfR2DNAbin(vcfR_test, extract.indels=FALSE, unphased_as_NA=FALSE)
# DNAbin of list type is generated
# Labels are missing
The solution
The problem comes from the transposition when generating the DNAbin object at the end of vcfR2DNAbin(). But this transposition is only needed when a matrix is converted.
Here is the modification proposed:
# Replace line 299 with this block
if (is.matrix(x)) { x <- t(x) }
x <- ape::as.DNAbin(x)
Hi Brian,
I found another unexpected behavior when using
vcfR2DNAbin()
which is not related to #122. When data have indels with missing data and these indels are kept, a DNAbin object of list type is created (due to the generation of sequences of different lengths) but labels are missing. Below a reproducible example and a solution.The reproducible example
The solution
The problem comes from the transposition when generating the DNAbin object at the end of
vcfR2DNAbin()
. But this transposition is only needed when a matrix is converted.Here is the modification proposed:
Fred