Closed ntakebay closed 1 year ago
Here is another bug when @pop of genInd is NULL.
l. 68-72 is like this:
if (is.null(data@pop)) { if (verbose) message(" strata: no") if (verbose) message(" 'pop' will be added") strata %<>% dplyr::mutate(STRATA = "pop") }
Before the closing curly braces, you can add:
pop(data) <- rep("pop",nInd(data))
And it works.
Here is the reproducible code to trip these two bugs:
library(adegenet) library(radiator) test <- data.frame(loc1=c("a/a","a/t", "g/t"),loc2=c("g/c",NA,"c/c"),loc3=c("g/c","g/g","c/c")) rownames(test)<-c("ind1","ind2","ind3")
tgi <- df2genind(test,sep="/")
tidy_genind(tgi)
Finally, there is an additional bug, when data is biallelic (the example used above isn't biallelic) and gds=T.
Here is the reproducible code:
library(adegenet) library(radiator) test <- data.frame(loc1=c("a/a","a/t", "t/t"),loc2=c("g/c",NA,"c/c"),loc3=c("g/c","g/g","c/c")) rownames(test)<-c("ind1","ind2","ind3")
tgi <- df2genind(test,sep="/")
tidy_genind(tgi)
It core dumped with the following error message:
No sample in '/scratch/test/radiator_temp_20220301@1454.vcf'
/usr/include/c++/11/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = std::cxx11::basic_string
Line 127 of genind.R causes the problem:
gds.filename <- radiator_gds(
data.source = "genind",
genotypes = geno,
strata = strata,
biallelic = TRUE,
markers.meta = markers.meta,
filename = NULL,
verbose = verbose
)
When I look at gds.R, line 53 is like this:
SeqArray::seqVCF2GDS( vcf.fn = paste0(temp.file, ".vcf"), out.fn = filename, parallel = 1L, storage.option = "ZIP_RA", verbose = FALSE )
And this is the place where it core dumped. SeqArray package is version 1.34.0.
Hi Naoki,
For now I'll keep things working for biallielic data with genind. There are so many flavours in multi allelic that are beyond the scope of radiator.
As for the sample size, I've modified the code to account for this, although I'm not sure it's going to work well in the other function. Again, radiator was design for large RAD data, ideally VCFs with read depth info. genind or genlight are not the ideal format, but I understand the need to be able to run DACP etc...
I'll see what I can do for the next push
Best Thierry
I'm also surprised that you would have a genind object without populations... usually adegenet is used for population analysis...
With radiator 1.1.9 2020-12-12, I noticed a small bug in tidy_genind() function in genind.R https://github.com/thierrygosselin/radiator/blob/master/R/genind.R.
Line 146-147 is like this:
A2 <- TRUE %in% unique(stringi::stri_detect_fixed(str = sample(colnames(data@tab), 100), pattern = ".A2")) if (isTRUE(TRUE %in% unique(stringi::stri_detect_fixed(str = sample(colnames(data@tab), 100), pattern = ".A2__")))) {
The problem occurs when the number of columns in tab of genInd object is less than 100;it causes the following error:
Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE'
It is better to replace "str = sample(..)" with something like this:
str = sample(colnames(data@tab), min(100, ncol(data@tab)))
I was playing with a small test data, and noticed this bug.
I'm running it on R version 4.1.2 on x86_64 linux.
Thank you, Naoki